(In Pandas) Why is frequency information lost when storing in HDF5 as a Table?

给你一囗甜甜゛ 提交于 2020-01-05 07:59:07

问题


I am storing timeseries data in HDF5 format within pandas, Because I want to be able to access the data directly on disk I am using the PyTable format with table=True when writing.

It appears that I then loose frequency information on my TimeSeries objects after writing them to HDF5.

This can be seen by toggling is_table value in script below:

import pandas as pd

is_table = False

times = pd.date_range('2000-1-1', periods=3, freq='H')
series = pd.Series(xrange(3), index=times)

print 'frequency before =', series.index.freq

frame = pd.DataFrame(series)

with pd.get_store('data/simple.h5') as store:
    store.put('data', frame, table=is_table)

with pd.get_store('data/simple.h5') as store:
    x = store['data']

print 'frequency after =', x[0].index.freq

with is_table = False:

frequency before = <1 Hour>
frequency after = <1 Hour>

with is_table = True:

frequency before = <1 Hour>
frequency after = None

It would seem to me that PyTables provides a much richer storage mechanism and that this would not be the case.

Is there a fundamental reason that PyTables cannot store, or reproduce, this information? Or is this a possible bug pandas?


回答1:


Just confirmed from pandas that this is not implemented in the current release.

See: https://github.com/pydata/pandas/issues/3499#issuecomment-17262905 for a work around.

I will update this answer when it becomes available.



来源:https://stackoverflow.com/questions/16311045/in-pandas-why-is-frequency-information-lost-when-storing-in-hdf5-as-a-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!