Use a generator for Keras model.fit_generator

后端未结

关注

 4  2133

I originally tried to use generator syntax when writing a custom generator for training a Keras model. So I yielded from __next__. How

相关标签:

4条回答

花落未央

2020-12-01 11:41

I have recently played with the generators for Keras and I finally managed to prepare an example. It uses random data, so trying to teach NN on it makes no sense, but it's a good illustration of using a python generator for Keras.

Generate some data

import numpy as np
import pandas as pd
data = np.random.rand(200,2)
expected = np.random.randint(2, size=200).reshape(-1,1)

dataFrame = pd.DataFrame(data, columns = ['a','b'])
expectedFrame = pd.DataFrame(expected, columns = ['expected'])

dataFrameTrain, dataFrameTest = dataFrame[:100],dataFrame[-100:]
expectedFrameTrain, expectedFrameTest = expectedFrame[:100],expectedFrame[-100:]

Generator

def generator(X_data, y_data, batch_size):

  samples_per_epoch = X_data.shape[0]
  number_of_batches = samples_per_epoch/batch_size
  counter=0

  while 1:

    X_batch = np.array(X_data[batch_size*counter:batch_size*(counter+1)]).astype('float32')
    y_batch = np.array(y_data[batch_size*counter:batch_size*(counter+1)]).astype('float32')
    counter += 1
    yield X_batch,y_batch

    #restart counter to yeild data in the next epoch as well
    if counter >= number_of_batches:
        counter = 0

Keras model

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten, Reshape
from keras.layers.convolutional import Convolution1D, Convolution2D, MaxPooling2D
from keras.utils import np_utils


model = Sequential()
model.add(Dense(12, activation='relu', input_dim=dataFrame.shape[1]))
model.add(Dense(1, activation='sigmoid'))


model.compile(loss='binary_crossentropy', optimizer='adadelta', metrics=['accuracy'])

#Train the model using generator vs using the full batch
batch_size = 8

model.fit_generator(
    generator(dataFrameTrain,expectedFrameTrain,batch_size),
    epochs=3,
    steps_per_epoch = dataFrame.shape[0]/batch_size,
    validation_data = generator(dataFrameTest,expectedFrameTest,batch_size*2),
    validation_steps = dataFrame.shape[0]/batch_size*2
)

#without generator
#model.fit(
#    x = np.array(dataFrame),
#    y = np.array(expected),
#    batch_size = batch_size,
#    epochs = 3
#)

Output

Epoch 1/3
25/25 [==============================] - 3s - loss: 0.7297 - acc: 0.4750 - 
val_loss: 0.7183 - val_acc: 0.5000
Epoch 2/3
25/25 [==============================] - 0s - loss: 0.7213 - acc: 0.3750 - 
val_loss: 0.7117 - val_acc: 0.5000
Epoch 3/3
25/25 [==============================] - 0s - loss: 0.7132 - acc: 0.3750 - 
val_loss: 0.7065 - val_acc: 0.5000

0 讨论(0)

余生分开走

2020-12-01 11:42

I would like to upgrade Vaasha's code with TensorFlow 2.x to achieve training efficiencies as well as ease of data processing. This is particularly useful for image processing.

Process the data using Generator function as Vaasha had generated in the above example or using tf.data.dataset API. The latter approach is very useful when processing any datasets with metadata. For example, MNIST data can be loaded and processed with a few statements.

import tensorflow as tf # Ensure that TensorFlow 2.x is used
tf.compat.v1.enable_eager_execution()
import tensorflow_datasets as tfds # Needed if you are using any of the tf datasets such as MNIST, CIFAR10
mnist_train = tfds.load(name="mnist", split="train")

Use tfds.load the datasets. Once data is loaded and processed (for example, converting categorical variables, resizing, etc.).

Now upgrading keras model using TensorFlow 2.x

 model = tf.keras.Sequential() # Tensorflow 2.0 upgrade
 model.add(tf.keras.layers.Dense(12, activation='relu', input_dim=dataFrame.shape[1]))
 model.add(tf.keras.layers.Dense(1, activation='sigmoid'))


 model.compile(loss='binary_crossentropy', optimizer='adadelta', metrics=['accuracy'])

 #Train the model using generator vs using the full batch
 batch_size = 8

 model.fit_generator(generator(dataFrameTrain,expectedFrameTrain,batch_size), epochs=3,steps_per_epoch = dataFrame.shape[0]/batch_size, validation_data=generator(dataFrameTest,expectedFrameTest,batch_size*2),validation_steps=dataFrame.shape[0]/batch_size*2)

This will upgrade the model to run in TensorFlow 2.x

0 讨论(0)

梦毁少年i

2020-12-01 11:44

I can't help debug your code since you didn't post it, but I abbreviated a custom data generator I wrote for a semantic segmentation project for you to use as a template:

def generate_data(directory, batch_size):
    """Replaces Keras' native ImageDataGenerator."""
    i = 0
    file_list = os.listdir(directory)
    while True:
        image_batch = []
        for b in range(batch_size):
            if i == len(file_list):
                i = 0
                random.shuffle(file_list)
            sample = file_list[i]
            i += 1
            image = cv2.resize(cv2.imread(sample[0]), INPUT_SHAPE)
            image_batch.append((image.astype(float) - 128) / 128)

        yield np.array(image_batch)

Usage:

model.fit_generator(
    generate_data('~/my_data', batch_size),
    steps_per_epoch=len(os.listdir('~/my_data')) // batch_size)

0 讨论(0)

心在旅途

2020-12-01 11:58

This is the way I implemented it for reading files any size. And it works like a charm.

import pandas as pd

hdr=[]
for i in range(num_labels+num_features):
    hdr.append("Col-"+str(i)) # data file do not have header so I need to
                              # provide one for pd.read_csv by chunks to work

def tgen(filename):
    csvfile = open(filename)
    reader = pd.read_csv(csvfile, chunksize=batch_size,names=hdr,header=None)
    while True:
    for chunk in reader:
        W=chunk.values        # labels and features
        Y =W[:,:num_labels]   # labels 
        X =W[:,num_labels:]   # features
        X= X / 255            # any required transformation
        yield X, Y
    csvfile = open(filename)
    reader = pd.read_csv(csvfile, chunksize=batchz,names=hdr,header=None)

The back in the main I have

nval=number_of_validation_samples//batchz
ntrain=number_of_training_samples//batchz
ftgen=tgen("training.csv")
fvgen=tgen("validation.csv")

history = model.fit_generator(ftgen,
                steps_per_epoch=ntrain,
                validation_data=fvgen,
                validation_steps=nval,
                epochs=number_of_epochs,
                callbacks=[checkpointer, stopper],
                verbose=2)

0 讨论(0)