pandas pytables append: performance and increase in file size

天涯浪子 提交于 2019-12-01 09:20:35
Jeff

This is basically the answer here, which I recently answered.

Bottom line is this, you need to turn off indexing store.append('df',df,index=False). When creating the store, then index it at the end.

Furthermore turn off compression when merging the tables as well.

Indexing is a fairly expensive operation and if I recall correctly, only uses a single processor.

Finally, make sure that you create the merged with with mode='w' as all of the subsequent operations are appends and you want to start with a clean new file.

I also would NOT specify the chunksize upfront. Rather, after you have created the final index, perform the compression using ptrepack and specify chunksize=auto which will compute it for you. I don't think this will affect write performance but will optimize query performance.

You might try tweaking the chunksize parameter to append (this is the writing chunksize) to a larger number as well.

Obviously make sure that each of the appending tables has exactly the same structure (will raise if this is not the case).

I created this issue for an enhancement to do this 'internally': https://github.com/pydata/pandas/issues/6837

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!