I have the following pandas dataframe:
import pandas as pd
df = pd.read_csv(filename.csv)
Now, I can use HDFStore
to write the df
object to file (like adding key-value pairs to a Python dictionary):
store = HDFStore('store.h5')
store['df'] = df
http://pandas.pydata.org/pandas-docs/stable/io.html
When I look at the contents, this object is a frame
.
store
outputs
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df frame (shape->[552,23252])
However, in order to use indexing, one should store this as a table
object.
My approach was to try HDFStore.put()
i.e.
HDFStore.put(key="store.h", value=df, format=Table)
However, this fails with the error:
TypeError: put() missing 1 required positional argument: 'self'
How does one save Pandas Dataframes as PyTables tables?
common part - create or open existing HDFStore file:
store = pd.HDFStore('store.h5')
Try this if you want to have indexed all columns:
store.append('key_name', df, data_columns=True)
or this if you want to have indexed just a subset of columns:
store.append('key_name', df, data_columns=['colA','colC','colN'])
PS HDFStore.append()
saves DFs per default in table
format
How does one save Pandas Dataframes as PyTables tables?
Adding to the accepted answer, you should always close the PyTable file. For convenience, Pandas provides the HDFStore as a context manager:
with pd.HDFStore('/path/to/data.hdf') as hdf:
hdf.put(key="store.h", value=df, format='table', data_columns=True)
来源:https://stackoverflow.com/questions/38460744/how-does-one-store-a-pandas-dataframe-as-an-hdf5-pytables-table-or-carray-earr