(In Pandas) Why is frequency information lost when storing in HDF5 as a Table?

问题

I am storing timeseries data in HDF5 format within pandas, Because I want to be able to access the data directly on disk I am using the PyTable format with table=True when writing.

It appears that I then loose frequency information on my TimeSeries objects after writing them to HDF5.

This can be seen by toggling is_table value in script below:

import pandas as pd

is_table = False

times = pd.date_range('2000-1-1', periods=3, freq='H')
series = pd.Series(xrange(3), index=times)

print 'frequency before =', series.index.freq

frame = pd.DataFrame(series)

with pd.get_store('data/simple.h5') as store:
    store.put('data', frame, table=is_table)

with pd.get_store('data/simple.h5') as store:
    x = store['data']

print 'frequency after =', x[0].index.freq

with is_table = False:

frequency before = <1 Hour>
frequency after = <1 Hour>

with is_table = True:

frequency before = <1 Hour>
frequency after = None

It would seem to me that PyTables provides a much richer storage mechanism and that this would not be the case.

Is there a fundamental reason that PyTables cannot store, or reproduce, this information? Or is this a possible bug pandas?

回答1:

Just confirmed from pandas that this is not implemented in the current release.

See: https://github.com/pydata/pandas/issues/3499#issuecomment-17262905 for a work around.

I will update this answer when it becomes available.

来源：https://stackoverflow.com/questions/16311045/in-pandas-why-is-frequency-information-lost-when-storing-in-hdf5-as-a-table

标签

pandas

hdf5

pytables