I have a set of image files in a directory train_images = \'./data/images\'
and train_labels = \'./data/labels.csv\'
For example - There are
Here's my example using ImageDataGenerator, with the flow_from_dataframe function from ImageDataGenerator, and Pandas to read the CSV. The CSV I was using had two columns:
x_col="Image"
y_col="Id"
So the first column is the filename e.g. xxxx.jpg, and the second column is the class. In this case, since it is from the kaggle humpback whale challenge, what kind of whale it is. The image files are in the directory "../input/humpback-whale-identification/train/"
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation,
Conv2D, Flatten, Dropout, MaxPooling2D, BatchNormalization
from tensorflow.keras.preprocessing.image import
ImageDataGenerator
from keras import regularizers, optimizers
import os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
So read the CSV using pandas:
traindf=pd.read_csv('../input/humpback-whale-
identification/train.csv',dtype=str)
Now using ImageDataGenerator
datagen=ImageDataGenerator(rescale=1./255.,validation_split=0.25)
train_generator=datagen.flow_from_dataframe(
dataframe=traindf,
directory="../input/humpback-whale-identification/train/",
x_col="Image",
y_col="Id",
subset="training",
batch_size=32,
seed=42,
shuffle=True,
class_mode="categorical",
target_size=(100,100))
Now sometimes the filename/ID in the CSV doesn't have an extension. So in that I used the following to add extensions to them:
def append_ext(fn):
return fn+".jpg"
traindf["Image"]=traindf["Image"].apply(append_ext)
Well hope that is helpful! It's my first try at answering a Q here :-)
The Kaggle dataset/challenge is here https://www.kaggle.com/c/humpback-whale-identification
Note: I've seen people doing this in all kinds of ways on kaggle! But this seems the easiest!
Then you can use pandas
to read the csv
file as a DataFrame
using the function read_csv
:
import pandas as pd
df = pd.read_csv('csvfilename', delimiter=',')
Then use the flow_from_dataframe
function of the ImageDataGenerator
class.
There is a tutorial at this link
flow_from_dataframe(dataframe, directory=None, x_col='filename', y_col='class', weight_col=None, target_size=(256, 256), color_mode='rgb', classes=None, class_mode='categorical', batch_size=32, shuffle=True, seed=None, save_to_dir=None, save_prefix='', save_format='png', subset=None, interpolation='nearest', validate_filenames=True)