问题
I have an HDF5 file containing a number of different groups all of which have the same number of rows. I also have a Boolean mask for rows to keep or remove. I would like to iterate over all groups in the HDF5 file removing rows according to the mask.
The recommended method to recursively visit all groups is visit(callable)
, but I can't work out how to pass my mask to the callable.
Here is some code hopefully demonstrating what I would like to do but which doesn't work:
def apply_mask(name, *args):
h5obj[name] = h5obj[name][mask]
with h5py.File(os.path.join(directory, filename), 'r+') as h5obj:
h5obj.visit(apply_mask, mask)
Which results in the error
TypeError: visit() takes 2 positional arguments but 3 were given
How can I get my mask array into this function?
回答1:
I eventually achieved this with a series of hacky workarounds. If there is a better solution I'd be interested to know about it!
with h5py.File(os.path.join(directory, filename), 'r+') as h5obj:
# Use the visit callable to append to a list of key names
h5_keys = []
h5obj.visit(h5_keys.append)
# Then loop over those keys and, if they're datasets rather than
# groups, remove the invalid rows
for h5_key in h5_keys:
if isinstance(h5obj[h5_key], h5py.Dataset):
tmp = np.array(h5obj[h5_key])[mask]
# There is no way to simply change the dataset because its
# shape is fixed, causing a broadcast error, so it is
# necessary to delete and then recreate it.
del h5obj[h5_key]
h5obj.create_dataset(h5_key, data=tmp)
来源:https://stackoverflow.com/questions/50027646/how-can-i-loop-over-hdf5-groups-in-python-removing-rows-according-to-a-mask