问题
In this page (https://pytorch.org/docs/stable/torchvision/models.html), it says that "All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406]
and std = [0.229, 0.224, 0.225]
".
Shouldn't the usual mean
and std
of normalization be [0.5, 0.5, 0.5]
and [0.5, 0.5, 0.5]
? Why is it setting such strange values?
回答1:
Using the mean and std of Imagenet is a common practice. They are calculated based on millions of images. If you want to train from scratch on your own dataset, you can calculate the new mean and std. Otherwise, using the Imagenet pretrianed model with its own mean and std is recommended.
回答2:
In that example, they are using the mean and std of ImageNet. If you look at PyTorch's MNIST examples, though, their mean and std are 1 dimensional (since the inputs are greyscale).
I would definitely recommend correctly normalizing your dataset before training! Otherwise you run into optimization problems.
Here's some sample code to get you started:
import os
import torch
from torchvision import datasets, transforms
from torch.utils.data.dataset import Dataset
from tqdm.notebook import tqdm
from time import time
N_CHANNELS = 1
dataset = datasets.MNIST("data", download=True,
train=True, transform=transforms.ToTensor())
full_loader = torch.utils.data.DataLoader(dataset, shuffle=False, num_workers=os.cpu_count())
before = time()
mean = torch.zeros(1)
std = torch.zeros(1)
print('==> Computing mean and std..')
for inputs, _labels in tqdm(full_loader):
for i in range(N_CHANNELS):
mean[i] += inputs[:,i,:,:].mean()
std[i] += inputs[:,i,:,:].std()
mean.div_(len(dataset))
std.div_(len(dataset))
print(mean, std)
print("time elapsed: ", time()-before)
来源:https://stackoverflow.com/questions/58151507/why-pytorch-officially-use-mean-0-485-0-456-0-406-and-std-0-229-0-224-0-2