Pandas: in memory sorting hdf5 files

I have the following problem:

I have a set several hdf5 files with similar data frames which I want to sort globally based on multiple columns.

My input is the file names and an ordered list of columns I want to use for sorting. The output should be a single hdf5 file containing all the sorted data.

Each file can contain millions of rows. I can afford loading a single file in memory but not the entire dataset.

Naively I would like first to copy all the data in a single hdf5 file (which is not difficult) and then find out a way to do in memory sorting of this huge file.

Is there a quick way to sort in memory a pandas datastructure stored in an hdf5 file based on multiple columns?

I have already seen ptrepack but it seems to allow you sorting only on a single column.

来源：https://stackoverflow.com/questions/24526254/pandas-in-memory-sorting-hdf5-files

标签

pandas

hdf5

pytables

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!