How do I read/write to a subgroup withing a HDF5Store?

断了今生、忘了曾经 提交于 2019-12-22 00:24:31

问题


I am using the HDF5Store, to store some of my processed results, prior to analysis. Into the store I want to put 3 types of results,

  • Raw results, that have not been processed at all, just read-in and merged from their original CSV formats
  • Processed results that are derived from the raw results, that have some proccessing and division into more logical groupings
  • Summarised results that have useful summery columns added and redundant columns removed, for easy reading.

I thought a HDF5Store with Hierarchical Keys would do it, one for Raw, one for Processed and one for Summarised.

I wanted a structure like:

<class 'pandas.io.pytables.HDFStore'>
File path: results.h5
/proccessed/dbn_reinit                           frame        (shape->[22880,19])
/proccessed/dbn_rerep_code                       frame        (shape->[11440,18])
/proccessed/dbn_rerep_enhanced_input             frame        (shape->[11440,18])
/proccessed/linear_classifier                    frame        (shape->[572,18])  
/proccessed/msda_rerep_code                      frame        (shape->[18304,17])
/proccessed/msda_rerep_enhanced_input            frame        (shape->[18304,17])
/raw/dbn_reinit                                  frame        (shape->[22880,15])
/raw/dbn_rerep                                   frame        (shape->[23452,15])
/raw/msda_rerep                                  frame        (shape->[36608,14])
/summerised/dbn_reinit                           frame        (shape->[22880,10])
/summerised/dbn_rerep_code                       frame        (shape->[11440,9]) 
/summerised/dbn_rerep_enhanced_input             frame        (shape->[11440,9]) 
/summerised/linear_classifier                    frame        (shape->[572,6])   
/summerised/msda_rerep_code                      frame        (shape->[18304,10])
/summerised/msda_rerep_enhanced_input            frame        (shape->[18304,10])

I expected I could create this by saying:

store = pandas.HDF5Store('results.h5')
store.add_group('raw')
raw_store = store['raw'] 
raw_store['dbn_reinit'] = dbn_reinit_dataframe
raw_store['dbn_rerep_code'] = dbn_rerep_code_dataframe
...

etc

However there doesn't seem to be a method of getting a subgroup of a store and using it as it it was a store,

so i had to do:

store = pd.HDFStore('results.h5', mode='w')

store['raw/dbn_reinit'] = dbn_reinit_dataframe
store['raw/dbn_rerep'] = dbn_reinit_dataframe
...

which is wordy, and doesn't really show any kind of grouping of the results into the 3 catagories Am i missing something? Or is the Hieratrchical features of the HDF, just writing really long key names that have /s in them?


回答1:


docs on using the hierarchical keys are here. .remove() has this type of functionaility, where you can remove nodes at that level and further down the tree.

You can do: store.get_storer('foo') to return an object that includes access to the node. (e.g. .group). However, this object won't allow you to add/select sub-nodes, nor does it provide a nice repr of that node.

You could put in a feature request for these features on github. Please include a reproducible example of what you think this should do.

Pull-requests are welcome!

I rarely use multiple groups. Mainly because of the flexibility of using different files. You can do what you are trying to do, I just have never found a need for it (e.g. treat your group as the file itself). HDF5 is not a database so this is rarely useful



来源:https://stackoverflow.com/questions/25130511/how-do-i-read-write-to-a-subgroup-withing-a-hdf5store

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!