I have 8000 images which I am loading with sklearn.datasets.load_files and passing through resnet from keras to get bottleneck features. However this task is taking hours on
This sounds like it would be better suited for the Keras ImageDataGenerator
class and to use the ImageDataGenerator.flow_from_directory
method. You don't have to use data augmentation with it (which would slow it down further) but you can choose your batch size to pull from the directory instead of loading them all.
Copied from https://keras.io/preprocessing/image/ and slightly modified with notes.
train_datagen = ImageDataGenerator( # <- customize your transformations
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
'data/train',
target_size=(150, 150),
batch_size=32, # <- control how many images are loaded each batch
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
'data/validation',
target_size=(150, 150),
batch_size=32,
class_mode='binary')
model.fit_generator(
train_generator,
steps_per_epoch=2000, # <- reduce here to lower the overall images used
epochs=50,
validation_data=validation_generator,
validation_steps=800)
Edit
Per your question below... steps_per_epoch determines how many batches are loaded for each epoch.
For example:
Would give you 1,600 images total for that epoch. Which is exactly 20% of your 8,000 images. Note that if you run into memory problems with a batch size of 32, you may want to decrease this and increase your steps_per_epoch. It will take some tinkering with to get it right.