In Google App Engine, how to I reduce memory consumption as I write a file out to the blobstore rather than exceed the soft memory limit?

后端未结

关注

 4  1352

I\'m using the blobstore to backup and recovery entities in csv format. The process is working well for all of my smaller models. However, once I start to work on models with mo

相关标签:

4条回答

抹茶落季

2021-02-04 15:40

I can't speak for the memory use in Python, but considering your error message, the error most likely stems from the fact that a blobstore backed file in GAE can't be open for more than around 30 seconds so you have to close and reopen it periodically if your processing takes longer.

0 讨论(0)
发布评论:

提交评论
- 加载中...
不思量自难忘°

2021-02-04 15:44

In What is the proper way to write to the Google App Engine blobstore as a file in Python 2.5 a similar problem was reported. In an answer there it is suggested that you should try inserting gc.collect() calls occasionally. Given what I know of the files API's implementation I think that is spot on. Give it a try!

0 讨论(0)
发布评论:

提交评论
- 加载中...
温柔的废话

2021-02-04 15:47
You'd be better off not doing the batching yourself, but just iterating over the query. The iterator will pick a batch size (probably 20) that should be adequate:
```
q = model.all()
for entity in q:
    row = get_dict_for_entity(entity)
    writer.writerow(row)
```
This avoids re-running the query with ever-increasing offset, which is slow and causes quadratic behavior in the datastore.

An oft-overlooked fact about memory usage is that the in-memory representation of an entity can use 30-50 times the RAM compared to the serialized form of the entity; e.g. an entity that is 3KB on disk might use 100KB in RAM. (The exact blow-up factor depends on many factors; it's worse if you have lots of properties with long names and small values, even worse for repeated properties with long names.)
0 讨论(0)
发布评论:

提交评论
- 加载中...
走了就别回头了

2021-02-04 15:49
It can possibly be a Time Exceed error, due to the limitation of the request to 30 secs. In my implementation in order to bypass it instead of having a webapp handler for the operation I am firing an event in the default queue. The cool thing about the queue is that it takes one line of code to invoke it, it has a 10 min time limit and if a task fails it retries before the time limit. I am not really sure if it will solve your problem but it worths giving a try.
```
from google.appengine.api import taskqueue
...
taskqueue.add("the url that invokes your method")
```
you can find more info about the queues here.

Or consider using a backend for serious computations and file operations.
0 讨论(0)
发布评论:

提交评论
- 加载中...