Improve throughput of ndb query over large data

前端未结

关注

 3  1630

余生分开走 2021-02-04 21:16

I am trying to perform some data processing in a GAE application over data that is stored in the Datastore. The bottleneck point is the throughput in which the query returns ent

3条回答

独厮守ぢ (楼主)

2021-02-04 21:37

In case anyone is interested, I was able to significantly increase the throughput of the data processing by re-designing the component - it was suggested that I change the data models but that was not possible.

First, I segmented the data and then processed each data segment in a separate taskqueue.Task instead of calling multiple fetch_page_async from a single task (as I described in the first post). Initially, these tasks were processed by GAE sequentially utilizing only a single Fx instance. To achieve parallelization of the tasks, I moved the component to a specific GAE module and used basic scaling, i.e. addressable Bx instances. When I enqueue the tasks for each data segment, I explicitly instruct which basic instance will handle each task by specifying the 'target' option.

With this design, I was able to process 20.000 entities in total within 4-5 seconds (instead of 40'-60'!), using 5 B4 instances.

Now, this has additional costs because of the Bx instances. We 'll have to fine tune the type and number of basic instances we need.

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...