问题
There is an HDF file 'file.h5' and the key name of a pandas DataFrame (or a Series) saved into it is 'df'. How can one determine in what format (i.e. ‘fixed’ or ‘table’) was 'df' saved into the file?
Thank you for your help!
回答1:
A bit late but maybe someone else may find it helpful.
You can parse the output of HDFStore.info(). Objects in table
format have the type appendable
:
>>> print(h5_table.info())
<class 'pandas.io.pytables.HDFStore'>
File path: /tmp/df_table.h5
/df frame_table (typ->appendable,nrows->2,ncols->2,indexers->[index],dc->[])
>>> print(h5_fixed.info())
<class 'pandas.io.pytables.HDFStore'>
File path: /tmp/df_fixed.h5
/df frame (shape->[2,2])
This is a minimal (i.e. without error handling for missing file or key) example:
def get_hd5_format(path, key):
with pd.HDFStore(path) as store:
info = store.info()
return 'table' if 'typ->appendable' in next(k for k in info.splitlines()[2:] if k.startswith('/'+key)).split()[2] else 'fixed'
Example usage:
>>> get_hd5_format('/tmp/df_table.h5', 'df')
'table'
>>> get_hd5_format('/tmp/df_fixed.h5', 'df')
'fixed'
回答2:
By default the format used is "fixed" which allows fast read/write capabilities but is neither appendable nor searchable.
However, you can even explicitly specify the format in which you want to get it saved in the hdf5 file as below:
df.to_hdf('file.h5', key='df', mode='w', format='table')
Note - The above command is just a sample chosen to illustrate the use of format parameter. The values of the parameters can be kept as per your requirement.
For any further reference related to this, you can also visit the below pandas documentation page :
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html
Hope the above information helps.
来源:https://stackoverflow.com/questions/50569465/determine-format-of-a-dataframe-in-pandas-hdf-file