I received this .h5 file from a friend and I need to use the data in it for some work. All the data is numerical. This the first time I work with these kind of files. I found ma
You need to traverse down your HDF5 hierarchy until you reach a dataset. Groups do not have a shape or type, datasets do.
Assuming you do not know your hierarchy structure in advance, you can use a recursive algorithm to yield, via an iterator, full paths to all available datasets in the form group1/group2/.../dataset
. Below is an example.
import h5py
def traverse_datasets(hdf_file):
def h5py_dataset_iterator(g, prefix=''):
for key in g.keys():
item = g[key]
path = f'{prefix}/{key}'
if isinstance(item, h5py.Dataset): # test for dataset
yield (path, item)
elif isinstance(item, h5py.Group): # test for group (go down)
yield from h5py_dataset_iterator(item, path)
for path, _ in h5py_dataset_iterator(hdf_file):
yield path
You can, for example, iterate all dataset paths and output attributes which interest you:
with h5py.File(filename, 'r') as f:
for dset in traverse_datasets(f):
print('Path:', dset)
print('Shape:', f[dset].shape)
print('Data type:', f[dset].dtype)
Remember that, by default, arrays in HDF5 are not read entirely in memory. You can read into memory via arr = f[dset][:]
, where dset
is the full path.