问题
I have a set of image files in a directory train_images = './data/images'
and train_labels = './data/labels.csv'
For example - There are 1000 images in train_images
as 377.jpg,17814.jpg ....
and so on. And the class they correspond to are saved in a different CSV file.
EDIT- Here are a few rows from the CSV file -
>>
ID Class
0 377.jpg MIDDLE
1 17814.jpg YOUNG
2 21283.jpg MIDDLE
3 16496.jpg YOUNG
4 4487.jpg MIDDLE
Here I.D is the image file name and the class is the class it is associated to.
I could have used the very usual
ImageDataGenerator().flow_from_directory(train_images, class_mode='binary', batch_size=64)
but the problem is that labels are in a CSV file. What I could do is to rename all the files using os
and put different files in different directories and then load it but it looks so immature and foolish.
How can I load data in Keras for CNN where each image is of dimension (h,w,c)
?
回答1:
Here's my example using ImageDataGenerator, with the flow_from_dataframe function from ImageDataGenerator, and Pandas to read the CSV. The CSV I was using had two columns:
x_col="Image"
y_col="Id"
So the first column is the filename e.g. xxxx.jpg, and the second column is the class. In this case, since it is from the kaggle humpback whale challenge, what kind of whale it is. The image files are in the directory "../input/humpback-whale-identification/train/"
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation,
Conv2D, Flatten, Dropout, MaxPooling2D, BatchNormalization
from tensorflow.keras.preprocessing.image import
ImageDataGenerator
from keras import regularizers, optimizers
import os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
So read the CSV using pandas:
traindf=pd.read_csv('../input/humpback-whale-
identification/train.csv',dtype=str)
Now using ImageDataGenerator
datagen=ImageDataGenerator(rescale=1./255.,validation_split=0.25)
train_generator=datagen.flow_from_dataframe(
dataframe=traindf,
directory="../input/humpback-whale-identification/train/",
x_col="Image",
y_col="Id",
subset="training",
batch_size=32,
seed=42,
shuffle=True,
class_mode="categorical",
target_size=(100,100))
Now sometimes the filename/ID in the CSV doesn't have an extension. So in that I used the following to add extensions to them:
def append_ext(fn):
return fn+".jpg"
traindf["Image"]=traindf["Image"].apply(append_ext)
Well hope that is helpful! It's my first try at answering a Q here :-)
The Kaggle dataset/challenge is here https://www.kaggle.com/c/humpback-whale-identification
Note: I've seen people doing this in all kinds of ways on kaggle! But this seems the easiest!
回答2:
Then you can use pandas
to read the csv
file as a DataFrame
using the function read_csv
:
import pandas as pd
df = pd.read_csv('csvfilename', delimiter=',')
Then use the flow_from_dataframe
function of the ImageDataGenerator
class.
There is a tutorial at this link
flow_from_dataframe(dataframe, directory=None, x_col='filename', y_col='class', weight_col=None, target_size=(256, 256), color_mode='rgb', classes=None, class_mode='categorical', batch_size=32, shuffle=True, seed=None, save_to_dir=None, save_prefix='', save_format='png', subset=None, interpolation='nearest', validate_filenames=True)
来源:https://stackoverflow.com/questions/59464409/loading-images-in-keras-for-cnn-from-directory-but-label-in-csv-file