Memory leak in Google ndb library

前端未结

关注

 3  1035

旧巷少年郎

I think there is a memory leak in the ndb library but I can not find where.

Is there a way to avoid the problem described below?
Do you have a more

相关标签:

3条回答

Happy的楠姐

2020-12-03 09:27

A possible workaround is to use context.clear_cache() and gc.collect() on get method.

def get(self):

    for _ in range(10):
        for entity in DummyModel.query().iter():
            pass # Do whatever you want
    self.response.headers['Content-Type'] = 'text/plain'
    self.response.write('Hello, World!')
    context = ndb.get_context()
    context.clear_cache()
    gc.collect()

0 讨论(0)

迷失自我

2020-12-03 09:34

After more investigations, and with the help of a google engineer, I've found two explanation to my memory consumption.

Context and thread

ndb.Context is a "thread local" object and is only cleared when a new request come in the thread. So the thread hold on it between requests. Many threads may exist in a GAE instance and it may take hundreds of requests before a thread is used a second time and it's context cleared.
This is not a memory leak, but contexts size in memory may exceed the available memory in a small GAE instance.

Workaround:
You can not configure the number of threads used in a GAE instance. So it is best to keep each context smallest possible. Avoid in-context cache, and clear it after each request.

Event queue

It seems that NDB does not guarantee that event queue is emptied after a request. Again this is not a memory leak. But it leave Futures in your thread context, and you're back to the first problem.

Workaround:
Wrap all your code that use NDB with @ndb.toplevel.

0 讨论(0)
发布评论:

提交评论
- 加载中...
星月不相逢

2020-12-03 09:49

There is a known issue with NDB. You can read about it here and there is a work around here:

The non-determinism observed with fetch_page is due to the iteration order of eventloop.rpcs, which is passed to datastore_rpc.MultiRpc.wait_any() and apiproxy_stub_map.__check_one selects the last rpc from the iterator.

Fetching with page_size of 10 does an rpc with count=10, limit=11, a standard technique to force the backend to more accurately determine whether there are more results. This returns 10 results, but due to a bug in the way the QueryIterator is unraveled, an RPC is added to fetch the last entry (using obtained cursor and count=1). NDB then returns the batch of entities without processing this RPC. I believe that this RPC will not be evaluated until selected at random (if MultiRpc consumes it before a necessary rpc), since it doesn't block client code.

Workaround: use iter(). This function does not have this issue (count and limit will be the same). iter() can be used as a workaround for the performance and memory issues associated with fetch page caused by the above.

0 讨论(0)
发布评论:

提交评论
- 加载中...