I am using the HDF5Store, to store some of my processed results, prior to analysis. Into the store I want to put 3 types of results,
- Raw results, that have not been processed at all, just read-in and merged from their original CSV formats
- Processed results that are derived from the raw results, that have some proccessing and division into more logical groupings
- Summarised results that have useful summery columns added and redundant columns removed, for easy reading.
I thought a HDF5Store with Hierarchical Keys would do it, one for Raw, one for Processed and one for Summarised.
I wanted a structure like:
<class 'pandas.io.pytables.HDFStore'> File path: results.h5 /proccessed/dbn_reinit frame (shape->[22880,19]) /proccessed/dbn_rerep_code frame (shape->[11440,18]) /proccessed/dbn_rerep_enhanced_input frame (shape->[11440,18]) /proccessed/linear_classifier frame (shape->[572,18]) /proccessed/msda_rerep_code frame (shape->[18304,17]) /proccessed/msda_rerep_enhanced_input frame (shape->[18304,17]) /raw/dbn_reinit frame (shape->[22880,15]) /raw/dbn_rerep frame (shape->[23452,15]) /raw/msda_rerep frame (shape->[36608,14]) /summerised/dbn_reinit frame (shape->[22880,10]) /summerised/dbn_rerep_code frame (shape->[11440,9]) /summerised/dbn_rerep_enhanced_input frame (shape->[11440,9]) /summerised/linear_classifier frame (shape->[572,6]) /summerised/msda_rerep_code frame (shape->[18304,10]) /summerised/msda_rerep_enhanced_input frame (shape->[18304,10])
I expected I could create this by saying:
store = pandas.HDF5Store('results.h5') store.add_group('raw') raw_store = store['raw'] raw_store['dbn_reinit'] = dbn_reinit_dataframe raw_store['dbn_rerep_code'] = dbn_rerep_code_dataframe ...
etc
However there doesn't seem to be a method of getting a subgroup of a store and using it as it it was a store,
so i had to do:
store = pd.HDFStore('results.h5', mode='w') store['raw/dbn_reinit'] = dbn_reinit_dataframe store['raw/dbn_rerep'] = dbn_reinit_dataframe ...
which is wordy, and doesn't really show any kind of grouping of the results into the 3 catagories Am i missing something? Or is the Hieratrchical features of the HDF, just writing really long key names that have /
s in them?