How is memory garbage collected in app engine (python) when iterating over db results

邮差的信 提交于 2019-12-10 19:47:25

问题


I have some code that iterates over DB entities, and runs in a task - see below.

On app engine I'm getting Exceeded soft private memory limit error, and indeed checking memory_usage().current() confirms the problem. See below for output from logging statement. It seems that every time a batch of foos is fetched the memory goes up.

My question is: why is the memory not being garbage collected? I would expect, that in each iteration of of the loops (the while loop, and the for loop, respectively) the re-use of the name foos and the foo would cause the objects to which foos and foo used to point would be 'de-referenced' (i.e. become inaccessible) and therefore become eligible for garbage collection, and then be garbage collected as memory gets tight. But evidently that it not happening.

from google.appengine.api.runtime import memory_usage

batch_size = 10
dict_of_results = {}
results = 0
cursor = None

while True:
  foos = models.Foo.all().filter('status =', 6)
  if cursor:
     foos.with_cursor(cursor)

  for foo in foos.run(batch_size = batch_size):

     logging.debug('on result #{} used memory of {}'.format(results, memory_usage().current()))
     results +=1

     bar  = some_module.get_bar(foo)

     if bar:
        try:
           dict_of_results[bar.baz] += 1
        except KeyError:
           dict_of_results[bar.baz] = 1


     if results >= batch_size:
        cursor = foos.cursor()
        break

  else:
     break   

and in some_module.py

def get_bar(foo):

  for bar in foo.bars:
    if bar.status == 10:
       return bar

  return None  

Output of logging.debug (shortened)

on result #1 used memory of 43
on result #2 used memory of 43
.....
on result #20 used memory of 43
on result #21 used memory of 49
.....
on result #32 used memory of 49
on result #33 used memory of 54
.....
on result #44 used memory of 54
on result #45 used memory of 59
.....
on result #55 used memory of 59
.....
.....
.....

on result #597 used memory of 284.3
Exceeded soft private memory limit of 256 MB with 313 MB after servicing 1 requests total

回答1:


It looks like your batch solution is conflicting with db's batching, resulting in a lot of extra batches hanging around.

When you run query.run(batch_size=batch_size), db will run the query until completion of the entire limit. When you reach the end of the batch, db will grab the next batch. However, right after db does this, you exit the loop and start again. What this means is that batches 1 -> n will all exist in memory twice. Once for the last queries fetch, once for your next queries fetch.

If you want to loop over all your entities, just let db handle the batching:

foos = models.Foo.all().filter('status =', 6)
for foo in foos.run(batch_size = batch_size):
  results +=1
  bar  = some_module.get_bar(foo)
  if bar:
    try:
      dict_of_results[bar.baz] += 1
    except KeyError:
      dict_of_results[bar.baz] = 1

Or, if you want to handle batching yourself, make sure db doesn't do any batching:

while True:
  foo_query = models.Foo.all().filter('status =', 6)
  if cursor:
    foo_query.with_cursor(cursor)
  foos = foo_query.fetch(limit=batch_size)
  if not foos:
    break

  cursor = foos.cursor()



回答2:


You might be looking in the wrong direction.

Take a look at this Q&A for approaches to check on garbage collection and for potential alternate explanations: Google App Engine DB Query Memory Usage



来源:https://stackoverflow.com/questions/32877705/how-is-memory-garbage-collected-in-app-engine-python-when-iterating-over-db-re

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!