How can I loop over HDF5 groups in Python removing rows according to a mask?

不羁岁月 提交于 2020-01-05 04:05:40

问题


I have an HDF5 file containing a number of different groups all of which have the same number of rows. I also have a Boolean mask for rows to keep or remove. I would like to iterate over all groups in the HDF5 file removing rows according to the mask.

The recommended method to recursively visit all groups is visit(callable), but I can't work out how to pass my mask to the callable.

Here is some code hopefully demonstrating what I would like to do but which doesn't work:

def apply_mask(name, *args):
    h5obj[name] = h5obj[name][mask]

with h5py.File(os.path.join(directory, filename), 'r+') as h5obj:
    h5obj.visit(apply_mask, mask)

Which results in the error

TypeError: visit() takes 2 positional arguments but 3 were given

How can I get my mask array into this function?


回答1:


I eventually achieved this with a series of hacky workarounds. If there is a better solution I'd be interested to know about it!

with h5py.File(os.path.join(directory, filename), 'r+') as h5obj:
    # Use the visit callable to append to a list of key names
    h5_keys = []
    h5obj.visit(h5_keys.append)
    # Then loop over those keys and, if they're datasets rather than
    # groups, remove the invalid rows
    for h5_key in h5_keys:
        if isinstance(h5obj[h5_key], h5py.Dataset):
            tmp = np.array(h5obj[h5_key])[mask]
            # There is no way to simply change the dataset because its
            # shape is fixed, causing a broadcast error, so it is
            # necessary to delete and then recreate it.
            del h5obj[h5_key]
            h5obj.create_dataset(h5_key, data=tmp)


来源:https://stackoverflow.com/questions/50027646/how-can-i-loop-over-hdf5-groups-in-python-removing-rows-according-to-a-mask

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!