How do I read/write to a subgroup withing a HDF5Store?

问题

I am using the HDF5Store, to store some of my processed results, prior to analysis. Into the store I want to put 3 types of results,

Raw results, that have not been processed at all, just read-in and merged from their original CSV formats
Processed results that are derived from the raw results, that have some proccessing and division into more logical groupings
Summarised results that have useful summery columns added and redundant columns removed, for easy reading.

I thought a HDF5Store with Hierarchical Keys would do it, one for Raw, one for Processed and one for Summarised.

I wanted a structure like:

<class 'pandas.io.pytables.HDFStore'>
File path: results.h5
/proccessed/dbn_reinit                           frame        (shape->[22880,19])
/proccessed/dbn_rerep_code                       frame        (shape->[11440,18])
/proccessed/dbn_rerep_enhanced_input             frame        (shape->[11440,18])
/proccessed/linear_classifier                    frame        (shape->[572,18])  
/proccessed/msda_rerep_code                      frame        (shape->[18304,17])
/proccessed/msda_rerep_enhanced_input            frame        (shape->[18304,17])
/raw/dbn_reinit                                  frame        (shape->[22880,15])
/raw/dbn_rerep                                   frame        (shape->[23452,15])
/raw/msda_rerep                                  frame        (shape->[36608,14])
/summerised/dbn_reinit                           frame        (shape->[22880,10])
/summerised/dbn_rerep_code                       frame        (shape->[11440,9]) 
/summerised/dbn_rerep_enhanced_input             frame        (shape->[11440,9]) 
/summerised/linear_classifier                    frame        (shape->[572,6])   
/summerised/msda_rerep_code                      frame        (shape->[18304,10])
/summerised/msda_rerep_enhanced_input            frame        (shape->[18304,10])

I expected I could create this by saying:

store = pandas.HDF5Store('results.h5')
store.add_group('raw')
raw_store = store['raw'] 
raw_store['dbn_reinit'] = dbn_reinit_dataframe
raw_store['dbn_rerep_code'] = dbn_rerep_code_dataframe
...

etc

However there doesn't seem to be a method of getting a subgroup of a store and using it as it it was a store,

so i had to do:

store = pd.HDFStore('results.h5', mode='w')

store['raw/dbn_reinit'] = dbn_reinit_dataframe
store['raw/dbn_rerep'] = dbn_reinit_dataframe
...

which is wordy, and doesn't really show any kind of grouping of the results into the 3 catagories Am i missing something? Or is the Hieratrchical features of the HDF, just writing really long key names that have /s in them?

回答1:

docs on using the hierarchical keys are here. .remove() has this type of functionaility, where you can remove nodes at that level and further down the tree.

You can do: store.get_storer('foo') to return an object that includes access to the node. (e.g. .group). However, this object won't allow you to add/select sub-nodes, nor does it provide a nice repr of that node.

You could put in a feature request for these features on github. Please include a reproducible example of what you think this should do.

Pull-requests are welcome!

I rarely use multiple groups. Mainly because of the flexibility of using different files. You can do what you are trying to do, I just have never found a need for it (e.g. treat your group as the file itself). HDF5 is not a database so this is rarely useful

来源：https://stackoverflow.com/questions/25130511/how-do-i-read-write-to-a-subgroup-withing-a-hdf5store

标签

python

pandas

hdf5

pytables