How is memory garbage collected in app engine (python) when iterating over db results

问题

I have some code that iterates over DB entities, and runs in a task - see below.

On app engine I'm getting Exceeded soft private memory limit error, and indeed checking memory_usage().current() confirms the problem. See below for output from logging statement. It seems that every time a batch of foos is fetched the memory goes up.

My question is: why is the memory not being garbage collected? I would expect, that in each iteration of of the loops (the while loop, and the for loop, respectively) the re-use of the name foos and the foo would cause the objects to which foos and foo used to point would be 'de-referenced' (i.e. become inaccessible) and therefore become eligible for garbage collection, and then be garbage collected as memory gets tight. But evidently that it not happening.

from google.appengine.api.runtime import memory_usage

batch_size = 10
dict_of_results = {}
results = 0
cursor = None

while True:
  foos = models.Foo.all().filter('status =', 6)
  if cursor:
     foos.with_cursor(cursor)

  for foo in foos.run(batch_size = batch_size):

     logging.debug('on result #{} used memory of {}'.format(results, memory_usage().current()))
     results +=1

     bar  = some_module.get_bar(foo)

     if bar:
        try:
           dict_of_results[bar.baz] += 1
        except KeyError:
           dict_of_results[bar.baz] = 1


     if results >= batch_size:
        cursor = foos.cursor()
        break

  else:
     break

and in some_module.py

def get_bar(foo):

  for bar in foo.bars:
    if bar.status == 10:
       return bar

  return None

Output of logging.debug (shortened)

on result #1 used memory of 43
on result #2 used memory of 43
.....
on result #20 used memory of 43
on result #21 used memory of 49
.....
on result #32 used memory of 49
on result #33 used memory of 54
.....
on result #44 used memory of 54
on result #45 used memory of 59
.....
on result #55 used memory of 59
.....
.....
.....

on result #597 used memory of 284.3
Exceeded soft private memory limit of 256 MB with 313 MB after servicing 1 requests total

回答1:

It looks like your batch solution is conflicting with db's batching, resulting in a lot of extra batches hanging around.

When you run query.run(batch_size=batch_size), db will run the query until completion of the entire limit. When you reach the end of the batch, db will grab the next batch. However, right after db does this, you exit the loop and start again. What this means is that batches 1 -> n will all exist in memory twice. Once for the last queries fetch, once for your next queries fetch.

If you want to loop over all your entities, just let db handle the batching:

foos = models.Foo.all().filter('status =', 6)
for foo in foos.run(batch_size = batch_size):
  results +=1
  bar  = some_module.get_bar(foo)
  if bar:
    try:
      dict_of_results[bar.baz] += 1
    except KeyError:
      dict_of_results[bar.baz] = 1

Or, if you want to handle batching yourself, make sure db doesn't do any batching:

while True:
  foo_query = models.Foo.all().filter('status =', 6)
  if cursor:
    foo_query.with_cursor(cursor)
  foos = foo_query.fetch(limit=batch_size)
  if not foos:
    break

  cursor = foos.cursor()

回答2:

You might be looking in the wrong direction.

Take a look at this Q&A for approaches to check on garbage collection and for potential alternate explanations: Google App Engine DB Query Memory Usage

来源：https://stackoverflow.com/questions/32877705/how-is-memory-garbage-collected-in-app-engine-python-when-iterating-over-db-re

标签

python

google-app-engine

garbage-collection

google-app-engine-python