TensorFlow/Keras - expected global_average_pooling2d_1_input to have shape (1, 1, 2048) but got array with shape (7, 7, 2048)

问题

I'm fairly new to TensorFlow and Image Classification, so I may be missing key knowledge and is probably why I'm facing this issue.

I've built a ResNet50 model in TensorFlow for the purpose of image classification of Dog Breeds using the ImageNet library and I have successfully trained a neural network which can detect various Dog Breeds.

I'm now at the point in which I would like to pass a random image of a dog to my model for it to spit out an output on what it thinks the dog breed is. However, when I run this function, dog_breed_predictor("<file path to image>"), I get the error expected global_average_pooling2d_1_input to have shape (1, 1, 2048) but got array with shape (7, 7, 2048) when it tries to execute the line Resnet50_model.predict(bottleneck_feature) and I don't know how to get around this.

Here's the code. I've provided all that I feel is relevant to the problem.

import cv2
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf

from keras.applications.resnet50 import ResNet50
from keras.preprocessing import image
from tqdm import tqdm

from sklearn.datasets import load_files
np_utils = tf.keras.utils

# define function to load train, test, and validation datasets
def load_dataset(path):
    data = load_files(path)
    dog_files = np.array(data['filenames'])
    dog_targets = np_utils.to_categorical(np.array(data['target']), 133)
    return dog_files, dog_targets

# load train, test, and validation datasets
train_files, train_targets = load_dataset('dogImages/dogImages/train')
valid_files, valid_targets = load_dataset('dogImages/dogImages/valid')
test_files, test_targets = load_dataset('dogImages/dogImages/test')

#define Resnet50 model
Resnet50_model = ResNet50(weights="imagenet")

def path_to_tensor(img_path):
    #loads RGB image as PIL.Image.Image type
    img = image.load_img(img_path, target_size=(224, 224))
    #convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
    x = image.img_to_array(img)
    #convert 3D tensor into 4D tensor with shape (1, 224, 224, 3)
    return np.expand_dims(x, axis=0)

from keras.applications.resnet50 import preprocess_input, decode_predictions

def ResNet50_predict_labels(img_path):
    #returns prediction vector for image located at img_path
    img = preprocess_input(path_to_tensor(img_path))
    return np.argmax(Resnet50_model.predict(img))

###returns True if a dog is detected in the image stored at img_path
def dog_detector(img_path):
    prediction = ResNet50_predict_labels(img_path)
    return ((prediction <= 268) & (prediction >= 151))

###Obtain bottleneck features from another pre-trained CNN
bottleneck_features = np.load("bottleneck_features/DogResnet50Data.npz")
train_DogResnet50 = bottleneck_features["train"]
valid_DogResnet50 = bottleneck_features["valid"]
test_DogResnet50 = bottleneck_features["test"]

###Define your architecture
Resnet50_model = tf.keras.Sequential()
Resnet50_model.add(tf.keras.layers.GlobalAveragePooling2D(input_shape=train_DogResnet50.shape[1:]))
Resnet50_model.add(tf.contrib.keras.layers.Dense(133, activation="softmax"))

Resnet50_model.summary()

###Compile the model
Resnet50_model.compile(loss="categorical_crossentropy", optimizer="rmsprop", metrics=["accuracy"])
###Train the model
checkpointer = tf.keras.callbacks.ModelCheckpoint(filepath="saved_models/weights.best.ResNet50.hdf5",
                                                 verbose=1, save_best_only=True)

Resnet50_model.fit(train_DogResnet50, train_targets,
                  validation_data=(valid_DogResnet50, valid_targets),
                  epochs=20, batch_size=20, callbacks=[checkpointer])

###Load the model weights with the best validation loss.
Resnet50_model.load_weights("saved_models/weights.best.ResNet50.hdf5")

###Calculate classification accuracy on the test dataset
Resnet50_predictions = [np.argmax(Resnet50_model.predict(np.expand_dims(feature, axis=0))) for feature in test_DogResnet50]

#Report test accuracy
test_accuracy = 100*np.sum(np.array(Resnet50_predictions)==np.argmax(test_targets, axis=1))/len(Resnet50_predictions)
print("Test accuracy: %.4f%%" % test_accuracy)

def extract_Resnet50(tensor):
    from keras.applications.resnet50 import ResNet50, preprocess_input
    return ResNet50(weights='imagenet', include_top=False).predict(preprocess_input(tensor))

def dog_breed(img_path):
    #extract bottleneck features
    bottleneck_feature = extract_Resnet50(path_to_tensor(img_path))
    #obtain predicted vector
    predicted_vector = Resnet50_model.predict(bottleneck_feature) #shape error occurs here
    #return dog breed that is predicted by the model
    return dog_names[np.argmax(predicted_vector)]

def dog_breed_predictor(img_path):
    #determine the predicted dog breed
    breed = dog_breed(img_path)
    #display the image
    img = cv2.imread(img_path)
    cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    plt.imshow(cv_rgb)
    plt.show()
    #display relevant predictor result
    if dog_detector(img_path):
        print("This is a dog and its breed is: " + str(breed))
    elif face_detector(img_path):
        print("This is a human but it looks like a: " + str(breed))
    else:
        print("I don't know what this is.")

dog_breed_predictor("dogImages/dogImages/train/016.Beagle/Beagle_01126.jpg")

The image I'm feeding into my function is from the same dataset that was used to train the model - I wanted to see myself if the model is working as intended - so this error makes it extra confusing. What could I be doing wrong?

回答1:

Thanks to nessuno's assistance, I figured out the issue. The problem was indeed with the pooling layer of ResNet50.

The following code in my script above:

return ResNet50(weights='imagenet',
                include_top=False).predict(preprocess_input(tensor))

returns a shape of (1, 7, 7, 2048) (admittedly though, I do not fully understand why). To get around this, I added in the parameter pooling="avg" as so:

return ResNet50(weights='imagenet',
                include_top=False,
                pooling="avg").predict(preprocess_input(tensor))

This instead returns a shape of (1, 2048) (again, admittedly, I do not know why.)

However, the model still expects a 4-D shape. To get around this I added in the following code in my dog_breed() function:

print(bottleneck_feature.shape) #returns (1, 2048)
bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0)
bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0)
bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0)
print(bottleneck_feature.shape) #returns (1, 1, 1, 1, 2048) - yes a 5D shape, not 4.

and this returns a shape of (1, 1, 1, 1, 2048). For some reason, the model still complained it was a 3D shape when I only added 2 more dimensions, but stopped when I added a 3rd (this is peculiar, and I would like to find out more about why this is.).

So overall, my dog_breed() function went from:

def dog_breed(img_path):
    #extract bottleneck features
    bottleneck_feature = extract_Resnet50(path_to_tensor(img_path))
    #obtain predicted vector
    predicted_vector = Resnet50_model.predict(bottleneck_feature) #shape error occurs here
    #return dog breed that is predicted by the model
    return dog_names[np.argmax(predicted_vector)]

to this:

def dog_breed(img_path):
    #extract bottleneck features
    bottleneck_feature = extract_Resnet50(path_to_tensor(img_path))
    print(bottleneck_feature.shape) #returns (1, 2048)
    bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0)
    bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0)
    bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0)
    print(bottleneck_feature.shape) #returns (1, 1, 1, 1, 2048) - yes a 5D shape, not 4.
    #obtain predicted vector
    predicted_vector = Resnet50_model.predict(bottleneck_feature) #shape error occurs here
    #return dog breed that is predicted by the model
    return dog_names[np.argmax(predicted_vector)]

whilst ensuring the parameter pooling="avg" is added to my call to ResNet50.

回答2:

The documentation of ResNet50 says something about the constructor parameter input_shape (emphasis is mine):

input_shape: optional shape tuple, only to be specified if include_top is False (otherwise the input shape has to be (224, 224, 3) (with 'channels_last' data format) or (3, 224, 224) (with 'channels_first' data format). It should have exactly 3 inputs channels, and width and height should be no smaller than 197. E.g. (200, 200, 3) would be one valid value.

My guess is that since you specified include_top to False the network definition pads the input to a bigger shape than 224x224, so when you extract the features you end up with a feature map and not with a feature vector (and that's the cause of your error).

Just try to specify and input_shape in this way:

return ResNet50(weights='imagenet',
                include_top=False,
                input_shape=(224, 224, 3)).predict(preprocess_input(tensor))

来源：https://stackoverflow.com/questions/51231576/tensorflow-keras-expected-global-average-pooling2d-1-input-to-have-shape-1-1

标签

python

tensorflow

keras

resnet

imagenet