问题
I have some code that iterates over DB entities, and runs in a task - see below.
On app engine I'm getting Exceeded soft private memory limit
error, and indeed checking memory_usage().current()
confirms the problem. See below for output from logging statement. It seems that every time a batch of foos is fetched the memory goes up.
My question is: why is the memory not being garbage collected? I would expect, that in each iteration of of the loops (the while
loop, and the for
loop, respectively) the re-use of the name foos
and the foo
would cause the objects to which foos
and foo
used to point would be 'de-referenced' (i.e. become inaccessible) and therefore become eligible for garbage collection, and then be garbage collected as memory gets tight. But evidently that it not happening.
from google.appengine.api.runtime import memory_usage
batch_size = 10
dict_of_results = {}
results = 0
cursor = None
while True:
foos = models.Foo.all().filter('status =', 6)
if cursor:
foos.with_cursor(cursor)
for foo in foos.run(batch_size = batch_size):
logging.debug('on result #{} used memory of {}'.format(results, memory_usage().current()))
results +=1
bar = some_module.get_bar(foo)
if bar:
try:
dict_of_results[bar.baz] += 1
except KeyError:
dict_of_results[bar.baz] = 1
if results >= batch_size:
cursor = foos.cursor()
break
else:
break
and in some_module.py
def get_bar(foo):
for bar in foo.bars:
if bar.status == 10:
return bar
return None
Output of logging.debug (shortened)
on result #1 used memory of 43
on result #2 used memory of 43
.....
on result #20 used memory of 43
on result #21 used memory of 49
.....
on result #32 used memory of 49
on result #33 used memory of 54
.....
on result #44 used memory of 54
on result #45 used memory of 59
.....
on result #55 used memory of 59
.....
.....
.....
on result #597 used memory of 284.3
Exceeded soft private memory limit of 256 MB with 313 MB after servicing 1 requests total
回答1:
It looks like your batch solution is conflicting with db's batching, resulting in a lot of extra batches hanging around.
When you run query.run(batch_size=batch_size)
, db will run the query until completion of the entire limit. When you reach the end of the batch, db will grab the next batch. However, right after db does this, you exit the loop and start again. What this means is that batches 1 -> n will all exist in memory twice. Once for the last queries fetch, once for your next queries fetch.
If you want to loop over all your entities, just let db handle the batching:
foos = models.Foo.all().filter('status =', 6)
for foo in foos.run(batch_size = batch_size):
results +=1
bar = some_module.get_bar(foo)
if bar:
try:
dict_of_results[bar.baz] += 1
except KeyError:
dict_of_results[bar.baz] = 1
Or, if you want to handle batching yourself, make sure db doesn't do any batching:
while True:
foo_query = models.Foo.all().filter('status =', 6)
if cursor:
foo_query.with_cursor(cursor)
foos = foo_query.fetch(limit=batch_size)
if not foos:
break
cursor = foos.cursor()
回答2:
You might be looking in the wrong direction.
Take a look at this Q&A for approaches to check on garbage collection and for potential alternate explanations: Google App Engine DB Query Memory Usage
来源:https://stackoverflow.com/questions/32877705/how-is-memory-garbage-collected-in-app-engine-python-when-iterating-over-db-re