The Keras ImageDataGenerator
class provides the two flow methods flow(X, y)
and flow_from_directory(directory)
(https://keras.io/prepr
X_data_resized = [skimage.transform.resize(image, new_shape) for image in X_data]
because of the above code is now depreciated...
For large training dataset, performing transformations such as resizing on the entire training data is very memory consuming. As Keras did in ImageDataGenerator, it's better to do it batch by batch. As far as I know, there're 2 ways to achieve this other than operating the whole dataset:
Here is the sample code if you use TensorFlow as the backend of Keras:
original_dim = (32, 32, 3)
target_size = (64, 64)
input = keras.layers.Input(original_dim)
x = tf.keras.layers.Lambda(lambda image: tf.image.resize(image, target_size))(input)
For anyone else who wants to do this, .flow method of ImageDataGenerator does not have a target_shape parameter and we cannot resize an image using preprocessing_function parameter as the documentation states The function will run after the image is resized and augmented. The function should take one argument: one image (Numpy tensor with rank 3), and should output a Numpy tensor with the same shape.
So in order to use .flow, you will have to pass resized images only otherwise use a custom generator that resizes them on the fly.
Here's a sample of custom generator in keras (can also be made using python generator or any other method)
class Custom_Generator(keras.utils.Sequence) :
def __init__(self,...,datapath, batch_size, ..) :
def __len__(self) :
#calculate data len, something like len(train_labels)
def load_and_preprocess_function(self, label_names, ...):
#do something...
#load data for the batch using label names with whatever library
def __getitem__(self, idx) :
batch_y = train_labels[idx:idx+batch_size]
batch_x = self.load_and_preprocess_function()
return ( batch_x, batch_y )
flow_from_directory(directory)
generates augmented images from directory with arbitrary collection of images. So there is need of parameter target_size
to make all images of same shape.
While flow(X, y)
augments images which are already stored in a sequence in X which is nothing but numpy matrix and can be easily preprocessed/resized before passing to flow
. So no need for target_size
parameter. As for resizing I prefer using scipy.misc.imresize
over PIL.Image resize
, or cv2.resize as it can operate on numpy image data.
import scipy
new_shape = (28,28,3)
X_train_new = np.empty(shape=(X_train.shape[0],)+new_shape)
for idx in xrange(X_train.shape[0]):
X_train_new[idx] = scipy.misc.imresize(X_train[idx], new_shape)