reading nested .h5 group into numpy array

后端 未结 1 759
感动是毒
感动是毒 2021-02-09 17:16

I received this .h5 file from a friend and I need to use the data in it for some work. All the data is numerical. This the first time I work with these kind of files. I found ma

相关标签:
1条回答
  • 2021-02-09 17:48

    You need to traverse down your HDF5 hierarchy until you reach a dataset. Groups do not have a shape or type, datasets do.

    Assuming you do not know your hierarchy structure in advance, you can use a recursive algorithm to yield, via an iterator, full paths to all available datasets in the form group1/group2/.../dataset. Below is an example.

    import h5py
    
    def traverse_datasets(hdf_file):
    
        def h5py_dataset_iterator(g, prefix=''):
            for key in g.keys():
                item = g[key]
                path = f'{prefix}/{key}'
                if isinstance(item, h5py.Dataset): # test for dataset
                    yield (path, item)
                elif isinstance(item, h5py.Group): # test for group (go down)
                    yield from h5py_dataset_iterator(item, path)
    
        for path, _ in h5py_dataset_iterator(hdf_file):
            yield path
    

    You can, for example, iterate all dataset paths and output attributes which interest you:

    with h5py.File(filename, 'r') as f:
        for dset in traverse_datasets(f):
            print('Path:', dset)
            print('Shape:', f[dset].shape)
            print('Data type:', f[dset].dtype)
    

    Remember that, by default, arrays in HDF5 are not read entirely in memory. You can read into memory via arr = f[dset][:], where dset is the full path.

    0 讨论(0)
提交回复
热议问题