问题
I received this .h5 file from a friend and I need to use the data in it for some work. All the data is numerical. This the first time I work with these kind of files. I found many questions and answers here about reading these files but I couldn't find a way to get to lower level of the groups or folders the file contains. The file contains two main folders, i.e. X and Y X contains a folder named 0 which contains two folders named A and B. Y contains ten folders named 1-10. The data I want to read is in A,B,1,2,..,10 for instance I start with
f = h5py.File(filename, 'r')
f.keys()
Now f returns [u'X', u'Y'] The two main folders
Then I try to read X and Y using read_direct but I get the error
AttributeError: 'Group' object has no attribute 'read_direct'
I try to create an object for X and Y as follows
obj1 = f['X']
obj2 = f['Y']
Then if I use command like
obj1.shape
obj1.dtype
I get an error
AttributeError: 'Group' object has no attribute 'shape'
I can see that these command don't work because I use then on X and Y which are folders contains no data but other folders.
So my question is how to get down to the folders named A, B,1-10 to read the data
I couldn't find a way to do that even in the documentation http://docs.h5py.org/en/latest/quick.html
回答1:
You need to traverse down your HDF5 hierarchy until you reach a dataset. Groups do not have a shape or type, datasets do.
Assuming you do not know your hierarchy structure in advance, you can use a recursive algorithm to yield, via an iterator, full paths to all available datasets in the form group1/group2/.../dataset
. Below is an example.
import h5py
def traverse_datasets(hdf_file):
def h5py_dataset_iterator(g, prefix=''):
for key in g.keys():
item = g[key]
path = f'{prefix}/{key}'
if isinstance(item, h5py.Dataset): # test for dataset
yield (path, item)
elif isinstance(item, h5py.Group): # test for group (go down)
yield from h5py_dataset_iterator(item, path)
for path, _ in h5py_dataset_iterator(hdf_file):
yield path
You can, for example, iterate all dataset paths and output attributes which interest you:
with h5py.File(filename, 'r') as f:
for dset in traverse_datasets(f):
print('Path:', dset)
print('Shape:', f[dset].shape)
print('Data type:', f[dset].dtype)
Remember that, by default, arrays in HDF5 are not read entirely in memory. You can read into memory via arr = f[dset][:]
, where dset
is the full path.
来源:https://stackoverflow.com/questions/51548551/reading-nested-h5-group-into-numpy-array