问题
I am using model.fit_generator to train and get results for my binary (two class) model because I am giving input images directly from my folder. How to get confusion matrix in this case (TP, TN, FP, FN) as well because generally I use confusion_matrix
command of sklearn.metrics
to get it, which requires predicted
, and actual
labels. But here I don't have both. May be I can calculate predicted labels from predict=model.predict_generator(validation_generator)
command. But I don't know how my model is taking input labels from my images. General structure of my input folder is:
train/
class1/
img1.jpg
img2.jpg
........
class2/
IMG1.jpg
IMG2.jpg
test/
class1/
img1.jpg
img2.jpg
........
class2/
IMG1.jpg
IMG2.jpg
........
and some blocks of my code is:
train_generator = train_datagen.flow_from_directory('train',
target_size=(50, 50), batch_size=batch_size,
class_mode='binary',color_mode='grayscale')
validation_generator = test_datagen.flow_from_directory('test',
target_size=(50, 50),batch_size=batch_size,
class_mode='binary',color_mode='grayscale')
model.fit_generator(
train_generator,steps_per_epoch=250 ,epochs=40,
validation_data=validation_generator,
validation_steps=21 )
So the above code automatically takes two class inputs, but I don't know for which it consider class 0 and for which class 1.
回答1:
I've managed it in the following way, using keras.utils.Sequence
.
from sklearn.metrics import confusion_matrix
from keras.utils import Sequence
class MySequence(Sequence):
def __init__(self, *args, **kwargs):
# initialize
# see manual on implementing methods
def __len__(self):
return self.length
def __getitem__(self, index):
# return index-th complete batch
# create data generator
data_gen = MySequence(evaluation_set, batch_size=10)
n_batches = len(data_gen)
confusion_matrix(
np.concatenate([np.argmax(data_gen[i][1], axis=1) for i in range(n_batches)]),
np.argmax(m.predict_generator(data_gen, steps=n_batches), axis=1)
)
The implemented class returns batches of data in tuples, that allows not to hold all of them in RAM. Please, note that it must be implemented in __getitem__
, and this method must return same batch for the same argument.
Unfortunately this code iterates data twice: first time, it creates array of true answers from returned batches, the second time it calls predict
method of the model.
回答2:
You can view the mapping from class names to class indices by calling the attribute class_indices
on your train_generator
or validation_generator
objects, as in
train_generator.class_indices
回答3:
probabilities = model.predict_generator(generator=test_generator)
will give us set of probabilities.
y_true = test_generator.classes
will give us true labels.
Because this is a binary classification problem, you have to find predicted labels. To do that you can use
y_pred = probabilities > 0.5
Then we have true labels and predicted labels on the test dataset. So, the confusion matrix is given by
font = {
'family': 'Times New Roman',
'size': 12
}
matplotlib.rc('font', **font)
mat = confusion_matrix(y_true, y_pred)
plot_confusion_matrix(conf_mat=mat, figsize=(8, 8), show_normed=False)
来源:https://stackoverflow.com/questions/47907061/how-to-get-confusion-matrix-when-using-model-fit-generator