问题
I'm on pandas 0.14.1. Assume I need to index data by two timestamps in a hierarchical index using timezones. When saving the resulted DataFrame to hdf5 I seem to lose timezone-awareness:
import pandas as pd
dti1 = pd.DatetimeIndex(start=pd.Timestamp('20000101'), end=pd.Timestamp('20000102'), freq='D', tz='EST5EDT')
dti2 = pd.DatetimeIndex(start=pd.Timestamp('20000102'), end=pd.Timestamp('20000103'), freq='D', tz='EST5EDT')
mux = pd.MultiIndex.from_arrays([dti1, dti2])
df = pd.DataFrame(0, index=mux, columns=['a'])
Here df
has timezones:
a
2000-01-01 00:00:00-05:00 2000-01-02 00:00:00-05:00 0
2000-01-02 00:00:00-05:00 2000-01-03 00:00:00-05:00 0
After saving and loading to hdf5, timezone information seems to disappear:
df.to_hdf('/tmp/my.h5', 'data')
pd.read_hdf('/tmp/my.h5', 'data')
results in:
a
2000-01-01 05:00:00 2000-01-02 05:00:00 0
2000-01-02 05:00:00 2000-01-03 05:00:00 0
I wonder if there is a good workaround and whether this is a know bug.
回答1:
This is not supported under fixed
format when using a multi-index. I guess should probably raise not implemented I supposed. Here's an issue to track this
See full-hdf5-interface docs here
In [11]: pd.read_hdf('/tmp/my.h5', 'data').index.levels[0]
Out[11]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-01-01 05:00:00, 2000-01-02 05:00:00]
Length: 2, Freq: None, Timezone: None
But if you specify table
format it works.
In [13]: df.to_hdf('/tmp/my.h5', 'data2', format='table')
In [14]: pd.read_hdf('/tmp/my.h5', 'data2')
Out[14]:
a
2000-01-01 00:00:00-05:00 2000-01-02 00:00:00-05:00 0
2000-01-02 00:00:00-05:00 2000-01-03 00:00:00-05:00 0
In [15]: pd.read_hdf('/tmp/my.h5', 'data2').index.levels[0]
Out[15]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-01-01 00:00:00-05:00, 2000-01-02 00:00:00-05:00]
Length: 2, Freq: None, Timezone: EST5EDT
In [16]: pd.read_hdf('/tmp/my.h5', 'data2').index.levels[1]
Out[16]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-01-02 00:00:00-05:00, 2000-01-03 00:00:00-05:00]
Length: 2, Freq: None, Timezone: EST5EDT
来源:https://stackoverflow.com/questions/24805307/losing-timezone-awareness-when-saving-hyerarchical-pandas-datetimeindex-to-hdf5