reading nested .h5 group into numpy array

后端未结

关注

 1  759

I received this .h5 file from a friend and I need to use the data in it for some work. All the data is numerical. This the first time I work with these kind of files. I found ma

相关标签:

1条回答

甜味超标

2021-02-09 17:48

You need to traverse down your HDF5 hierarchy until you reach a dataset. Groups do not have a shape or type, datasets do.

Assuming you do not know your hierarchy structure in advance, you can use a recursive algorithm to yield, via an iterator, full paths to all available datasets in the form group1/group2/.../dataset. Below is an example.

import h5py

def traverse_datasets(hdf_file):

    def h5py_dataset_iterator(g, prefix=''):
        for key in g.keys():
            item = g[key]
            path = f'{prefix}/{key}'
            if isinstance(item, h5py.Dataset): # test for dataset
                yield (path, item)
            elif isinstance(item, h5py.Group): # test for group (go down)
                yield from h5py_dataset_iterator(item, path)

    for path, _ in h5py_dataset_iterator(hdf_file):
        yield path

You can, for example, iterate all dataset paths and output attributes which interest you:

with h5py.File(filename, 'r') as f:
    for dset in traverse_datasets(f):
        print('Path:', dset)
        print('Shape:', f[dset].shape)
        print('Data type:', f[dset].dtype)

Remember that, by default, arrays in HDF5 are not read entirely in memory. You can read into memory via arr = f[dset][:], where dset is the full path.

0 讨论(0)