How to load percentage of data with sklearn.datasets.load_files

后端未结

关注

 1  1010

I have 8000 images which I am loading with sklearn.datasets.load_files and passing through resnet from keras to get bottleneck features. However this task is taking hours on

相关标签:

1条回答

[愿得一人]

2021-01-14 11:21
This sounds like it would be better suited for the Keras ImageDataGenerator class and to use the ImageDataGenerator.flow_from_directory method. You don't have to use data augmentation with it (which would slow it down further) but you can choose your batch size to pull from the directory instead of loading them all.

Copied from https://keras.io/preprocessing/image/ and slightly modified with notes.
```
train_datagen = ImageDataGenerator(  # <- customize your transformations
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        'data/train',
        target_size=(150, 150),
        batch_size=32,  # <- control how many images are loaded each batch
        class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
        'data/validation',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')

model.fit_generator(
        train_generator,
        steps_per_epoch=2000,  # <- reduce here to lower the overall images used
        epochs=50,
        validation_data=validation_generator,
        validation_steps=800)
```
Edit

Per your question below... steps_per_epoch determines how many batches are loaded for each epoch.

For example:
- steps_per_epoch = 50
- batch_size = 32
- epochs = 1
Would give you 1,600 images total for that epoch. Which is exactly 20% of your 8,000 images. Note that if you run into memory problems with a batch size of 32, you may want to decrease this and increase your steps_per_epoch. It will take some tinkering with to get it right.
0 讨论(0)
发布评论:

提交评论
- 加载中...