When I run a query on a large set of small objects (15k objects with only a few short string and boolean properties), without doing anything with these objects, I see my instance's memory usage continuously increasing (70Mb increase). The memory increase doesn't look proportional to the amount of data it ever needs to keep in memory for just the query.
The loop I use is the following:
cursor = None
while True:
query = MyModel.all()
if cursor:
query.with_cursor(cursor)
fetched = 0
for result in query.run(batch_size = 500):
fetched += 1
# Do something with 'result' here. Actually leaving it empty for
# testing to be sure I don't retain anything myself
if fetched == 500:
cursor = query.cursor()
break
else:
break
To be sure this is not due to appstats, I call appstats.recording.dont_record()
to not record any stats.
Does anyone have any clue what might be going on? Or any pointers on how to debug/profile this?
Update 1: I turned on gc.set_debug(gc.DEBUG_STATS)
on the production code, and I see the garbage collector being called regularly, so it is trying to collect garbage. When I call a gc.collect()
at the end of the loop (also the end of the request); it returns 0
, and doesn't help.
Update 2: I did some hacking to get guppy to work on dev_appserver, and this seemed to point that, after an explicit gc.collect()
at the end of the loop, most of the memory was consumed by a 'dict of google.appengine.datastore.entity_pb.Property'.
Each model entity has some over head.
You query returns objects as Protobufs for starters.
So you will a series of batched protobufs for the result set.
Then it is decoded. Each decoded entity includes the property names as well as the data for each entity. You have 15K entities. How big are your property names for instance.
So you have at least two copies of the result set in memory in various forms (possibly more), not including anything else you do with instances of the model class.
You code/loop has no opportunity for garbage collections, and that can/will happen later.
Have a look at tools like apptrace to help memory profiling.
I have reported this to the app engine team, and they seem to confirm this is actually a problem (suspected to be with the handling of cursors).
来源:https://stackoverflow.com/questions/31853703/google-app-engine-db-query-memory-usage