I have large pandas DataFrames with financial data. I have no problem appending and concatenating additional columns and DataFrames to my .h5 file.
The financial dat
tohlcv_candle.to_hdf('test.h5',key='this_is_a_key', append=True, mode='r+', format='t')
You need to pass another argument append=True
to specify that the data is to be appended to existing data if found under that key, instead of over-writing it.
Without this, the default is False
and if it encounters an existing table under 'this_is_a_key'
then it overwrites.
The mode=
argument is only at file-level, telling whether the file as a whole is to be overwritten or appended.
One file can have any number of keys, so a mode='a', append=False
setting will mean only one key gets over-written while the other keys stay.
I had a similar experience as yours and found the additional append argument in the reference doc. After setting it, now it's appending properly for me.
Ref: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html
Note: hdf5 won't bother doing anything with the dataframe's indexes. We need to iron those out before putting the data in or when we take it out.
pandas.HDFStore.put() has parameter append
(which defaults to False
) - that instructs Pandas to overwrite instead of appending.
So try this:
store = pd.HDFStore('test.h5')
store.append('name_of_frame', ohlcv_candle, format='t', data_columns=True)
we can also use store.put(..., append=True)
, but this file should also be created in a table format:
store.put('name_of_frame', ohlcv_candle, format='t', append=True, data_columns=True)
NOTE: appending works only for the table
(format='t'
- is an alias for format='table'
) format.