Loading images in Keras for CNN from directory but label in CSV file

前端 未结 2 920
[愿得一人]
[愿得一人] 2021-01-27 22:20

I have a set of image files in a directory train_images = \'./data/images\' and train_labels = \'./data/labels.csv\'

For example - There are

相关标签:
2条回答
  • 2021-01-27 23:04

    Here's my example using ImageDataGenerator, with the flow_from_dataframe function from ImageDataGenerator, and Pandas to read the CSV. The CSV I was using had two columns:

    x_col="Image"
    y_col="Id"
    

    So the first column is the filename e.g. xxxx.jpg, and the second column is the class. In this case, since it is from the kaggle humpback whale challenge, what kind of whale it is. The image files are in the directory "../input/humpback-whale-identification/train/"

    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import Dense, Activation, 
    Conv2D, Flatten, Dropout, MaxPooling2D, BatchNormalization
    from tensorflow.keras.preprocessing.image import 
    ImageDataGenerator
    from keras import regularizers, optimizers
    import os
    import numpy as np
    import matplotlib.pyplot as plt
    import pandas as pd
    

    So read the CSV using pandas:

    traindf=pd.read_csv('../input/humpback-whale- 
    identification/train.csv',dtype=str)
    

    Now using ImageDataGenerator

    datagen=ImageDataGenerator(rescale=1./255.,validation_split=0.25)
    train_generator=datagen.flow_from_dataframe(
    dataframe=traindf,
    directory="../input/humpback-whale-identification/train/",
    x_col="Image",
    y_col="Id",
    subset="training",
    batch_size=32,
    seed=42,
    shuffle=True,
    class_mode="categorical",
    target_size=(100,100))
    

    Now sometimes the filename/ID in the CSV doesn't have an extension. So in that I used the following to add extensions to them:

    def append_ext(fn):
        return fn+".jpg"
    
    traindf["Image"]=traindf["Image"].apply(append_ext)
    
    

    Well hope that is helpful! It's my first try at answering a Q here :-)

    The Kaggle dataset/challenge is here https://www.kaggle.com/c/humpback-whale-identification

    Note: I've seen people doing this in all kinds of ways on kaggle! But this seems the easiest!

    0 讨论(0)
  • 2021-01-27 23:16

    Then you can use pandas to read the csv file as a DataFrame using the function read_csv:

    import pandas as pd
    
    df = pd.read_csv('csvfilename', delimiter=',')
    

    Then use the flow_from_dataframe function of the ImageDataGenerator class.

    There is a tutorial at this link

    flow_from_dataframe(dataframe, directory=None, x_col='filename', y_col='class', weight_col=None, target_size=(256, 256), color_mode='rgb', classes=None, class_mode='categorical', batch_size=32, shuffle=True, seed=None, save_to_dir=None, save_prefix='', save_format='png', subset=None, interpolation='nearest', validate_filenames=True)

    0 讨论(0)
提交回复
热议问题