How to load percentage of data with sklearn.datasets.load_files

后端 未结 1 1010
春和景丽
春和景丽 2021-01-14 11:10

I have 8000 images which I am loading with sklearn.datasets.load_files and passing through resnet from keras to get bottleneck features. However this task is taking hours on

相关标签:
1条回答
  • 2021-01-14 11:21

    This sounds like it would be better suited for the Keras ImageDataGenerator class and to use the ImageDataGenerator.flow_from_directory method. You don't have to use data augmentation with it (which would slow it down further) but you can choose your batch size to pull from the directory instead of loading them all.

    Copied from https://keras.io/preprocessing/image/ and slightly modified with notes.

    train_datagen = ImageDataGenerator(  # <- customize your transformations
            rescale=1./255,
            shear_range=0.2,
            zoom_range=0.2,
            horizontal_flip=True)
    
    test_datagen = ImageDataGenerator(rescale=1./255)
    
    train_generator = train_datagen.flow_from_directory(
            'data/train',
            target_size=(150, 150),
            batch_size=32,  # <- control how many images are loaded each batch
            class_mode='binary')
    
    validation_generator = test_datagen.flow_from_directory(
            'data/validation',
            target_size=(150, 150),
            batch_size=32,
            class_mode='binary')
    
    model.fit_generator(
            train_generator,
            steps_per_epoch=2000,  # <- reduce here to lower the overall images used
            epochs=50,
            validation_data=validation_generator,
            validation_steps=800)
    

    Edit

    Per your question below... steps_per_epoch determines how many batches are loaded for each epoch.

    For example:

    • steps_per_epoch = 50
    • batch_size = 32
    • epochs = 1

    Would give you 1,600 images total for that epoch. Which is exactly 20% of your 8,000 images. Note that if you run into memory problems with a batch size of 32, you may want to decrease this and increase your steps_per_epoch. It will take some tinkering with to get it right.

    0 讨论(0)
提交回复
热议问题