In Google App Engine, how to I reduce memory consumption as I write a file out to the blobstore rather than exceed the soft memory limit?

后端 未结 4 1317
伪装坚强ぢ
伪装坚强ぢ 2021-02-04 15:11

I\'m using the blobstore to backup and recovery entities in csv format. The process is working well for all of my smaller models. However, once I start to work on models with mo

相关标签:
4条回答
  • 2021-02-04 15:40

    I can't speak for the memory use in Python, but considering your error message, the error most likely stems from the fact that a blobstore backed file in GAE can't be open for more than around 30 seconds so you have to close and reopen it periodically if your processing takes longer.

    0 讨论(0)
  • 2021-02-04 15:44

    In What is the proper way to write to the Google App Engine blobstore as a file in Python 2.5 a similar problem was reported. In an answer there it is suggested that you should try inserting gc.collect() calls occasionally. Given what I know of the files API's implementation I think that is spot on. Give it a try!

    0 讨论(0)
  • 2021-02-04 15:47

    You'd be better off not doing the batching yourself, but just iterating over the query. The iterator will pick a batch size (probably 20) that should be adequate:

    q = model.all()
    for entity in q:
        row = get_dict_for_entity(entity)
        writer.writerow(row)
    

    This avoids re-running the query with ever-increasing offset, which is slow and causes quadratic behavior in the datastore.

    An oft-overlooked fact about memory usage is that the in-memory representation of an entity can use 30-50 times the RAM compared to the serialized form of the entity; e.g. an entity that is 3KB on disk might use 100KB in RAM. (The exact blow-up factor depends on many factors; it's worse if you have lots of properties with long names and small values, even worse for repeated properties with long names.)

    0 讨论(0)
  • 2021-02-04 15:49

    It can possibly be a Time Exceed error, due to the limitation of the request to 30 secs. In my implementation in order to bypass it instead of having a webapp handler for the operation I am firing an event in the default queue. The cool thing about the queue is that it takes one line of code to invoke it, it has a 10 min time limit and if a task fails it retries before the time limit. I am not really sure if it will solve your problem but it worths giving a try.

    from google.appengine.api import taskqueue
    ...
    taskqueue.add("the url that invokes your method")
    

    you can find more info about the queues here.

    Or consider using a backend for serious computations and file operations.

    0 讨论(0)
提交回复
热议问题