问题
I am using the HDF5Store, to store some of my processed results, prior to analysis. Into the store I want to put 3 types of results,
- Raw results, that have not been processed at all, just read-in and merged from their original CSV formats
- Processed results that are derived from the raw results, that have some proccessing and division into more logical groupings
- Summarised results that have useful summery columns added and redundant columns removed, for easy reading.
I thought a HDF5Store with Hierarchical Keys would do it, one for Raw, one for Processed and one for Summarised.
I wanted a structure like:
<class 'pandas.io.pytables.HDFStore'>
File path: results.h5
/proccessed/dbn_reinit frame (shape->[22880,19])
/proccessed/dbn_rerep_code frame (shape->[11440,18])
/proccessed/dbn_rerep_enhanced_input frame (shape->[11440,18])
/proccessed/linear_classifier frame (shape->[572,18])
/proccessed/msda_rerep_code frame (shape->[18304,17])
/proccessed/msda_rerep_enhanced_input frame (shape->[18304,17])
/raw/dbn_reinit frame (shape->[22880,15])
/raw/dbn_rerep frame (shape->[23452,15])
/raw/msda_rerep frame (shape->[36608,14])
/summerised/dbn_reinit frame (shape->[22880,10])
/summerised/dbn_rerep_code frame (shape->[11440,9])
/summerised/dbn_rerep_enhanced_input frame (shape->[11440,9])
/summerised/linear_classifier frame (shape->[572,6])
/summerised/msda_rerep_code frame (shape->[18304,10])
/summerised/msda_rerep_enhanced_input frame (shape->[18304,10])
I expected I could create this by saying:
store = pandas.HDF5Store('results.h5')
store.add_group('raw')
raw_store = store['raw']
raw_store['dbn_reinit'] = dbn_reinit_dataframe
raw_store['dbn_rerep_code'] = dbn_rerep_code_dataframe
...
etc
However there doesn't seem to be a method of getting a subgroup of a store and using it as it it was a store,
so i had to do:
store = pd.HDFStore('results.h5', mode='w')
store['raw/dbn_reinit'] = dbn_reinit_dataframe
store['raw/dbn_rerep'] = dbn_reinit_dataframe
...
which is wordy, and doesn't really show any kind of grouping of the results into the 3 catagories
Am i missing something?
Or is the Hieratrchical features of the HDF,
just writing really long key names that have /
s in them?
回答1:
docs on using the hierarchical keys are here. .remove()
has this type of functionaility, where you can remove nodes at that level and further down the tree.
You can do: store.get_storer('foo')
to return an object that includes access to the node. (e.g. .group
). However, this object won't allow you to add/select sub-nodes, nor does it provide a nice repr of that node.
You could put in a feature request for these features on github. Please include a reproducible example of what you think this should do.
Pull-requests are welcome!
I rarely use multiple groups. Mainly because of the flexibility of using different files. You can do what you are trying to do, I just have never found a need for it (e.g. treat your group as the file itself). HDF5 is not a database so this is rarely useful
来源:https://stackoverflow.com/questions/25130511/how-do-i-read-write-to-a-subgroup-withing-a-hdf5store