问题
I'm trying to load hdf5 datasets into a pytorch training for loop.
Regardless of num_workers in dataloader, this randomly throws "KeyError: 'Unable to open object (component not found)' " (traceback below).
I'm able to start the training loop, but not able to get through 1/4 of one epoch without this error which happens for random 'datasets' (which are 2darrays each). I'm able to separately load these arrays in the console using the regular f['group/subroup'][()]
so it doesn't appear like the hdf file is corrupted or that there's anything wrong with the datasets/array.
I've tried:
- adjusting num_workers as per various other issues that people have had with pytorch - still happens with 0 num_workers.
- upgrading /downgrading, torch, numpy and python versions.
- using f.close() at the end of data loader getitem
- using a fresh conda env and installing dependencies.
- calling parent groups first, then initialising array eg:
X = f[ID]
thenX = X[()]
- using double slashes in hdf path
Because this recurs with num_workers=0, I figure it's not a multithreading issue although the traceback seems to point to lines from /torch/utils/data/dataloader that prep the next batch.
I just can't figure out why h5py can't see the odd individual dataset, randomly.
IDs are strings to match hdf paths eg:
ID = "ID_12345//Ep_-1//AN_67891011//ABC"
excerpt from dataloader:
def __getitem__(self, index):
ID = self.list_IDs[index]
# Start hdf file in read mode:
f = h5py.File(self.hdf_file, 'r', libver='latest', swmr=True)
X = f[ID][()]
X = X[:, :, np.newaxis] # torchvision 0.2.1 needs (H x W x C) for transforms
y = self.y_list[index]
if self.transform:
X = self.transform(X)
return ID, X, y
`
Expected: training for loop
Actual: IDs / datasets / examples are loaded fine initially, then after between 20 and 200 steps...
Traceback (most recent call last):
File "Documents/BSSA-loc/mamdl/models/main_v3.py", line 287, in main() File "Documents/BSSA-loc/mamdl/models/main_v3.py", line 203, in main for i, (IDs, images, labels) in enumerate(train_loader): File "/home/james/anaconda3/envs/jc/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 615, in next batch = self.collate_fn([self.dataset[i] for i in indices]) File "/home/james/anaconda3/envs/jc/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 615, in batch = self.collate_fn([self.dataset[i] for i in indices]) File "/home/james/Documents/BSSA-loc/mamdl/src/data_loading/Data_loader_v3.py", line 59, in getitem X = f[ID][()] File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "/home/james/anaconda3/envs/jc/lib/python3.7/site-packages/h5py/_hl/group.py", line 262, in getitem oid = h5o.open(self.id, self._e(name), lapl=self._lapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5o.pyx", line 190, in h5py.h5o.openKeyError: 'Unable to open object (component not found)'
回答1:
For the record, my best guess is that this was due a bug in my code for hdf construction, which was stopped and started multiple times in append mode.
Some datasets appeared as though they were complete when queried f['group/subroup'][()]
but were not able to loaded with pytorch dataloader.
Haven't had this issue since rebuilding hdf differently.
来源:https://stackoverflow.com/questions/55473368/h5py-randomly-unable-to-open-object-component-not-found