Iteratively writing to HDF5 Stores in Pandas

后端未结

关注

 2  1912

别跟我提以往

Pandas has the following examples for how to store Series, DataFrames and Panelsin HDF5 files:

Prepare some data:

相关标签:

2条回答

生来不讨喜

2020-12-23 08:27
1. As soon as the statement is exectued, eg store['df'] = df. The close just closes the actual file (which will be closed for you if the process exists, but will print a warning message)
2. Read the section http://pandas.pydata.org/pandas-docs/dev/io.html#storing-in-table-format
  
  It is generally not a good idea to put a LOT of nodes in an .h5 file. You probably want to append and create a smaller number of nodes.
  
  You can just iterate thru your .csv and store/append them one by one. Something like:
```
for f in files:
  df = pd.read_csv(f)
  df.to_hdf('file.h5',f,df)
```
  Would be one way (creating a separate node for each file)
3. Not appendable - once you write it, you can only retrieve it all at once, e.g. you cannot select a sub-section
  
  If you have a table, then you can do things like:
```
pd.read_hdf('my_store.h5','a_table_node',['index>100'])
```
  which is like a database query, only getting part of the data
  
  Thus, a store is not appendable, nor queryable, while a table is both.
0 讨论(0)
发布评论:

提交评论
- 加载中...
既然无缘

2020-12-23 08:32
Answering question 2, with pandas 0.18.0 you can do:
```
store = pd.HDFStore('compiled_measurements.h5')
for filepath in file_iterator:
    raw = pd.read_csv(filepath)
    store.append('measurements', raw, index=False)

store.create_table_index('measurements', columns=['a', 'b', 'c'], optlevel=9, kind='full')
store.close()
```
Based on this part of the docs.

Depending on how much data you have, the index creation can consume enormous amounts of memory. The PyTables docs describes the values of optlevel.
0 讨论(0)
发布评论:

提交评论
- 加载中...