NDB using too many memory for a simple .put()?

403
March 29, 2017, at 11:54 PM

I have an Appengine app in Python (Flask) and I've started tracing the memory consumption following errors on the console.

I've narrowed down the culprit to a simple .put() from an NDB entry and I don't know what to do to fix that issue.

Here's the model I use from NDB :

class MassiveEntry(ndb.Model):
    massive_id = ndb.IntegerProperty(indexed=True)
    row     = ndb.IntegerProperty(indexed=True)
    name    = ndb.StringProperty(indexed=False)
    address = ndb.StringProperty(indexed=False)
    email   = ndb.StringProperty(indexed=False, default=None)
    reason  = ndb.StringProperty(indexed=False, default=None)
    # Used for Tasks processing :
    contact_id = ndb.IntegerProperty(indexed=False, default=None)
    processing = ndb.BooleanProperty(name="is_processing", default=True, indexed=True)  # Only when searching contact processing is set to True
    treated = ndb.BooleanProperty(name="is_treated", default=False, indexed=True)

When a customer upload a CSV file on my app, I read it and add as many MassiveEntry entries has lines. After that, I have a asynchronous task that read the next available (processing = False, treated = False) entry, set the entry.processing to True and treat the data. Once it's done, it updates the entry.treated to True, and check the next available entry.

I get a list of next available entries by using this request :

entries = MassiveEntry.query().filter(cls.massive_id == massive).filter(cls.processing == False).filter(cls.treated == False).iter()

(massive is an id from the database).

Then, I loop over the results to load a few next available items.

I checked the memory stats when loading the entries, nothing changes. but when I do

entry.processing = True
entry.put()

The memory takes 10Mb! This happens when the related MassiveEntry.massive_id value contains more than 50k entries.

To sum up :

# Memory stat : no changes
entries = MassiveEntry.query().filter(cls.massive_id == massive).filter(cls.processing == False).filter(cls.treated == False).iter()
# Memory stat : no changes
for entry in entries:
     # Memory stat : no changes
     # Do some stuff
     # Memory stat : no changes
     entry.treated = True
     entry.put() # HERE! The memory goes up to 10Mb ONLY WHEN the massive_id is related to around 50k rows in the datastore.
     # Memory stat : +10Mb !

Do you have any idea why? And how can I prevent that? If needed, I can provide more code but I think this is enough.

Rent Charter Buses Company
READ ALSO
How do I plot an energy ranking figure using Python?

How do I plot an energy ranking figure using Python?

This is a typical energy ranking that is used in a few published papers and I am struggling to reproduce one for my data using Python (anything matplotlib, sns etc)

491
Python selenium print frame source

Python selenium print frame source

This is my first foray into SeleniumApologies in advance if this is a stupid/trivial question

522
Python's tab spacing

Python's tab spacing

So I have a program that prints out two "fish" per line, and the distance between two fish is a tab character "\t", the output looks like this:

431
Save regular expression result in variable

Save regular expression result in variable

I would like to split a string into separated, single strings and save each in a new variableThat's the use case:

423