问题
Is there any easier way to set up the dataloader, because input and target data is the same in case of an autoencoder and to load the data during training? The DataLoader always requires two inputs.
Currently I define my dataloader like this:
X_train = rnd.random((300,100))
X_val = rnd.random((75,100))
train = data_utils.TensorDataset(torch.from_numpy(X_train).float(), torch.from_numpy(X_train).float())
val = data_utils.TensorDataset(torch.from_numpy(X_val).float(), torch.from_numpy(X_val).float())
train_loader= data_utils.DataLoader(train, batch_size=1)
val_loader = data_utils.DataLoader(val, batch_size=1)
and train like this:
for epoch in range(50):
for batch_idx, (data, target) in enumerate(train_loader):
data, target = Variable(data), Variable(target).detach()
optimizer.zero_grad()
output = model(data, x)
loss = criterion(output, target)
回答1:
Why not subclassing TensorDataset to make it compatible with unlabeled data ?
class UnlabeledTensorDataset(TensorDataset):
"""Dataset wrapping unlabeled data tensors.
Each sample will be retrieved by indexing tensors along the first
dimension.
Arguments:
data_tensor (Tensor): contains sample data.
"""
def __init__(self, data_tensor):
self.data_tensor = data_tensor
def __getitem__(self, index):
return self.data_tensor[index]
And something along these lines for training your autoencoder
X_train = rnd.random((300,100))
train = UnlabeledTensorDataset(torch.from_numpy(X_train).float())
train_loader= data_utils.DataLoader(train, batch_size=1)
for epoch in range(50):
for batch in train_loader:
data = Variable(batch)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, data)
回答2:
I believe this is as simple as it gets. Other than that, I guess you will have to implement your own dataset. A sample code is below.
class ImageLoader(torch.utils.data.Dataset):
def __init__(self, root, tform=None, imgloader=PIL.Image.open):
super(ImageLoader, self).__init__()
self.root=root
self.filenames=sorted(glob(root))
self.tform=tform
self.imgloader=imgloader
def __len__(self):
return len(self.filenames)
def __getitem__(self, i):
out = self.imgloader(self.filenames[i]) # io.imread(self.filenames[i])
if self.tform:
out = self.tform(out)
return out
You can then use it as follows.
source_dataset=ImageLoader(root='/dldata/denoise_ae/clean/*.png', tform=source_depth_transform)
target_dataset=ImageLoader(root='/dldata/denoise_ae/clean_cam_n9dmaps/*.png', tform=target_depth_transform)
source_dataloader=torch.utils.data.DataLoader(source_dataset, batch_size=32, shuffle=False, drop_last=True, num_workers=15)
target_dataloader=torch.utils.data.DataLoader(target_dataset, batch_size=32, shuffle=False, drop_last=True, num_workers=15)
To test the 1st batch go as follows.
dataiter = iter(source_dataloader)
images = dataiter.next()
print(images.size())
And finally you can enumerate on the loaded data in the batch training loop as follows.
for i, (source, target) in enumerate(zip(source_dataloader, target_dataloader), 0):
source, target = Variable(source.float().cuda()), Variable(target.float().cuda())
Have fun.
PS. The code samples I shared so not load validation data.
来源:https://stackoverflow.com/questions/45099554/how-to-simplify-dataloader-for-autoencoder-in-pytorch