问题
I am storing timeseries data in HDF5 format within pandas, Because I want to be able to access the data directly on disk I am using the PyTable format with table=True
when writing.
It appears that I then loose frequency information on my TimeSeries objects after writing them to HDF5.
This can be seen by toggling is_table
value in script below:
import pandas as pd
is_table = False
times = pd.date_range('2000-1-1', periods=3, freq='H')
series = pd.Series(xrange(3), index=times)
print 'frequency before =', series.index.freq
frame = pd.DataFrame(series)
with pd.get_store('data/simple.h5') as store:
store.put('data', frame, table=is_table)
with pd.get_store('data/simple.h5') as store:
x = store['data']
print 'frequency after =', x[0].index.freq
with is_table = False
:
frequency before = <1 Hour>
frequency after = <1 Hour>
with is_table = True
:
frequency before = <1 Hour>
frequency after = None
It would seem to me that PyTables provides a much richer storage mechanism and that this would not be the case.
Is there a fundamental reason that PyTables cannot store, or reproduce, this information? Or is this a possible bug pandas?
回答1:
Just confirmed from pandas that this is not implemented in the current release.
See: https://github.com/pydata/pandas/issues/3499#issuecomment-17262905 for a work around.
I will update this answer when it becomes available.
来源:https://stackoverflow.com/questions/16311045/in-pandas-why-is-frequency-information-lost-when-storing-in-hdf5-as-a-table