问题
I wrote a image vgg classification model with pytorch's pretrained vgg16 model.
import matplotlib.pyplot as plt
import numpy as np
import torch
from PIL import Image
import urllib
from skimage.transform import resize
from skimage import io
import yaml
# Downloading imagenet 1000 classes list
file = urllib. request. urlopen("https://gist.githubusercontent.com/yrevar/942d3a0ac09ec9e5eb3a/raw/238f720ff059c1f82f368259d1ca4ffa5dd8f9f5/imagenet1000_clsidx_to_labels.txt")
classes = ''
for f in file:
classes = classes + f.decode("utf-8")
classes = yaml.load(classes)
# Downloading pretrained vgg16 model
model = torch.hub.load('pytorch/vision:v0.6.0', 'vgg16', pretrained=True)
print(model)
for param in model.parameters():
param.requires_grad = False
url, filename = ("https://raw.githubusercontent.com/pytorch/hub/master/dog.jpg", "dog.jpg")
image=io.imread(url)
plt.imshow(image)
plt.show()
# resize to 224x224x3
img = resize(image,(224,224,3))
plt.imshow(img)
plt.show()
# Normalizing input for vgg16
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
img1 = mean*img+std
img1 = np.clip(img1,0,1)
img1 = torch.from_numpy(img1).unsqueeze(0)
img1 = img1.permute(0,3,2,1) # batch_size x channels x height x width
model.eval()
pred = model(img1.float())
print(classes[torch.argmax(pred).numpy().tolist()])
The code works fine but its outputting wrong classes. I am not sure where I did wrong but If I have to guess it might be the imagenet yaml classes list or at the normalizing input image. Can anyone tell me where I am making the mistakes?
回答1:
There are some issues with the image preprocessing. Firstly, the normalisation is calculated as (value - mean) / std)
, not value * mean + std
. Secondly, the values should not be clipped to [0, 1], the normalisation purposely shifts the values away from [0, 1]. Secondly, the image as NumPy array has shape [height, width, 3], when you permute the dimensions you swap the height and width dimension, creating a tensor with shape [batch_size, channels, width, height].
img = resize(image,(224,224,3))
# Normalizing input for vgg16
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
img1 = (img1 - mean) / std
img1 = torch.from_numpy(img1).unsqueeze(0)
img1 = img1.permute(0, 3, 1, 2) # batch_size x channels x height x width
Instead of doing that manually, you can use torchvision.transforms.
from torchvision import transforms
preprocess = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
img = resize(image,(224,224,3))
img1 = preprocess(img)
img1 = img1.unsqueeze(0)
If you use PIL to load the images, you could also resize the images by adding transforms.Resize((224, 224)) to the preprocessing pipeline, or you could even add transforms.ToPILImage() to first convert the image to a PIL image (transforms.Resize
requires a PIL image).
来源:https://stackoverflow.com/questions/62482336/classification-with-pretrained-pytorch-vgg16-model-and-its-classes