How does data normalization work in keras during prediction?

前端 未结 4 1604
日久生厌
日久生厌 2020-12-23 10:58

I see that the imageDataGenerator allows me to specify different styles of data normalization, e.g. featurewise_center, samplewise_center, etc.

I see from the exampl

相关标签:
4条回答
  • 2020-12-23 11:09

    Yes - this is a really huge downside of Keras.ImageDataGenerator that you couldn't provide the standarization statistics on your own. But - there is an easy method on how to overcome this issue.

    Assuming that you have a function normalize(x) which is normalizing an image batch (remember that generator is not providing a simple image but an array of images - a batch with shape (nr_of_examples_in_batch, image_dims ..) you could make your own generator with normalization by using:

    def gen_with_norm(gen, normalize):
        for x, y in gen:
            yield normalize(x), y
    

    Then you might simply use gen_with_norm(datagen.flow, normalize) instead of datagen.flow.

    Moreover - you might recover the mean and std computed by a fit method by getting it from appropriate fields in datagen (e.g. datagen.mean and datagen.std).

    0 讨论(0)
  • 2020-12-23 11:14

    I am using the datagen.fit function itself.

    from keras.preprocessing.image import ImageDataGenerator
    
    train_datagen = ImageDataGenerator(
        featurewise_center=True,
        featurewise_std_normalization=True)
    train_datagen.fit(train_data)
    
    test_datagen = ImageDataGenerator(  
        featurewise_center=True, 
        featurewise_std_normalization=True)
    test_datagen.fit(train_data)
    

    Ideally with this, test_datagen fitted on training dataset will learn the training datasets statistics. Then it will use these statistics to normalize testing data.

    0 讨论(0)
  • 2020-12-23 11:15

    Use the standardize method of the generator for each element. Here is a complete example for CIFAR 10:

    #!/usr/bin/env python
    
    import keras
    from keras.datasets import cifar10
    from keras.preprocessing.image import ImageDataGenerator
    from keras.models import Sequential
    from keras.layers import Dense, Dropout, Flatten
    from keras.layers import Conv2D, MaxPooling2D
    
    # input image dimensions
    img_rows, img_cols, img_channels = 32, 32, 3
    num_classes = 10
    
    batch_size = 32
    epochs = 1
    
    # The data, shuffled and split between train and test sets:
    (x_train, y_train), (x_test, y_test) = cifar10.load_data()
    print(x_train.shape[0], 'train samples')
    print(x_test.shape[0], 'test samples')
    
    # Convert class vectors to binary class matrices.
    y_train = keras.utils.to_categorical(y_train, num_classes)
    y_test = keras.utils.to_categorical(y_test, num_classes)
    
    model = Sequential()
    
    model.add(Conv2D(32, (3, 3), padding='same', activation='relu',
                     input_shape=x_train.shape[1:]))
    model.add(Conv2D(32, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    
    model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    
    model.add(Flatten())
    model.add(Dense(512, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(num_classes, activation='softmax'))
    
    model.compile(loss='categorical_crossentropy', optimizer='rmsprop',
                  metrics=['accuracy'])
    
    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')
    x_train /= 255
    x_test /= 255
    
    datagen = ImageDataGenerator(zca_whitening=True)
    
    # Compute principal components required for ZCA
    datagen.fit(x_train)
    
    # Apply normalization (ZCA and others)
    print(x_test.shape)
    for i in range(len(x_test)):
        # this is what you are looking for
        x_test[i] = datagen.standardize(x_test[i])
    print(x_test.shape)
    
    # Fit the model on the batches generated by datagen.flow().
    model.fit_generator(datagen.flow(x_train, y_train,
                                     batch_size=batch_size),
                        steps_per_epoch=x_train.shape[0] // batch_size,
                        epochs=epochs,
                        validation_data=(x_test, y_test))
    
    0 讨论(0)
  • 2020-12-23 11:20

    I also had the same issue and I solved it using the same functionality, that the ImageDataGenerator used:

    # Load Cifar-10 dataset
    (trainX, trainY), (testX, testY) = cifar10.load_data()
    generator = ImageDataGenerator(featurewise_center=True, 
                                   featurewise_std_normalization=True)
    
    # Calculate statistics on train dataset
    generator.fit(trainX)
    # Apply featurewise_center to test-data with statistics from train data
    testX -= generator.mean
    # Apply featurewise_std_normalization to test-data with statistics from train data
    testX /= (generator.std + K.epsilon())
    
    # Do your regular fitting
    model.fit_generator(..., validation_data=(testX, testY), ...)
    

    Note that this is only possible if you have a reasonable small dataset, like CIFAR-10. Otherwise the solution proposed by Marcin sounds good more reasonable.

    0 讨论(0)
提交回复
热议问题