h5py randomly unable to open object (component not found)

生来就可爱ヽ(ⅴ<●) 提交于 2021-01-07 02:55:48

问题


I'm trying to load hdf5 datasets into a pytorch training for loop.

Regardless of num_workers in dataloader, this randomly throws "KeyError: 'Unable to open object (component not found)' " (traceback below).

I'm able to start the training loop, but not able to get through 1/4 of one epoch without this error which happens for random 'datasets' (which are 2darrays each). I'm able to separately load these arrays in the console using the regular f['group/subroup'][()] so it doesn't appear like the hdf file is corrupted or that there's anything wrong with the datasets/array.

I've tried:

  • adjusting num_workers as per various other issues that people have had with pytorch - still happens with 0 num_workers.
  • upgrading /downgrading, torch, numpy and python versions.
  • using f.close() at the end of data loader getitem
  • using a fresh conda env and installing dependencies.
  • calling parent groups first, then initialising array eg: X = f[ID] then X = X[()]
  • using double slashes in hdf path

Because this recurs with num_workers=0, I figure it's not a multithreading issue although the traceback seems to point to lines from /torch/utils/data/dataloader that prep the next batch.

I just can't figure out why h5py can't see the odd individual dataset, randomly.

IDs are strings to match hdf paths eg: ID = "ID_12345//Ep_-1//AN_67891011//ABC"

excerpt from dataloader:

def __getitem__(self, index):

    ID = self.list_IDs[index]

    # Start hdf file in read mode:
    f = h5py.File(self.hdf_file, 'r', libver='latest', swmr=True)

    X = f[ID][()]

    X = X[:, :, np.newaxis] # torchvision 0.2.1 needs (H x W x C) for transforms

    y = self.y_list[index]

    if self.transform:
        X = self.transform(X)

    return ID, X, y

`

Expected: training for loop

Actual: IDs / datasets / examples are loaded fine initially, then after between 20 and 200 steps...

Traceback (most recent call last):

File "Documents/BSSA-loc/mamdl/models/main_v3.py", line 287, in main() File "Documents/BSSA-loc/mamdl/models/main_v3.py", line 203, in main for i, (IDs, images, labels) in enumerate(train_loader): File "/home/james/anaconda3/envs/jc/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 615, in next batch = self.collate_fn([self.dataset[i] for i in indices]) File "/home/james/anaconda3/envs/jc/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 615, in batch = self.collate_fn([self.dataset[i] for i in indices]) File "/home/james/Documents/BSSA-loc/mamdl/src/data_loading/Data_loader_v3.py", line 59, in getitem X = f[ID][()] File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "/home/james/anaconda3/envs/jc/lib/python3.7/site-packages/h5py/_hl/group.py", line 262, in getitem oid = h5o.open(self.id, self._e(name), lapl=self._lapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5o.pyx", line 190, in h5py.h5o.open

KeyError: 'Unable to open object (component not found)'


回答1:


For the record, my best guess is that this was due a bug in my code for hdf construction, which was stopped and started multiple times in append mode. Some datasets appeared as though they were complete when queried f['group/subroup'][()] but were not able to loaded with pytorch dataloader.

Haven't had this issue since rebuilding hdf differently.



来源:https://stackoverflow.com/questions/55473368/h5py-randomly-unable-to-open-object-component-not-found

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!