Macro metrics (recall/F1…) for multiclass CNN

问题

I use CNN for image classification on unbalance dataset. I'm totaly new with tensorflow backend. It's multiclass problem (not multilabel) and I have 16 classes. Class are one hot encoded.

I want to compute MACRO metrics for each epoch: F1, precision and recall.

I found a code to print those Macro metrics but it's only work on validation set From: https://medium.com/@thongonary/how-to-compute-f1-score-for-each-epoch-in-keras-a1acd17715a2

class Metrics(Callback):

 def on_train_begin(self, logs={}):
  self.val_f1s = []
  self.val_recalls = []
  self.val_precisions = []

 def on_epoch_end(self, epoch, logs={}):
  val_predict = (np.asarray(self.model.predict(self.validation_data[0]))).round()
  val_targ = self.validation_data[1]
  _val_f1 = f1_score(val_targ, val_predict,average='macro')
  _val_recall = recall_score(val_targ, val_predict,average='macro')
  _val_precision = precision_score(val_targ, val_predict,average='macro')
  self.val_f1s.append(_val_f1)
  self.val_recalls.append(_val_recall)
  self.val_precisions.append(_val_precision)
  print (" — val_f1: %f — val_precision: %f — val_recall %f" % (_val_f1, _val_precision, _val_recall))
  return

metrics = Metrics()

I'm not even sure this code is really working since we use

 val_predict = (np.asarray(self.model.predict(self.validation_data[0]))).round()

could ROUND lead to error in case of multiclass classification?

And I use this code to print the metrics (only recall since that the important metrics for me) on the training set (also compute on validation set since it's used in model.compute) code has been adapted from: Custom macro for recall in keras



def recall(y_true,y_pred):
     true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
     possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
     return  true_positives / (possible_positives + K.epsilon())

def unweightedRecall(y_true, y_pred):
    return (recall(y_true[:,0],y_pred[:,0]) + recall(y_true[:,1],y_pred[:,1])+recall(y_true[:,2],y_pred[:,2]) + recall(y_true[:,3],y_pred[:,3])
            +recall(y_true[:,4],y_pred[:,4]) + recall(y_true[:,5],y_pred[:,5])
            +recall(y_true[:,6],y_pred[:,6]) + recall(y_true[:,7],y_pred[:,7])
            +recall(y_true[:,8],y_pred[:,8]) + recall(y_true[:,9],y_pred[:,9])
            +recall(y_true[:,10],y_pred[:,10]) + recall(y_true[:,11],y_pred[:,11])
            +recall(y_true[:,12],y_pred[:,12]) + recall(y_true[:,13],y_pred[:,13])
            +recall(y_true[:,14],y_pred[:,14]) + recall(y_true[:,15],y_pred[:,15]))/16.

I run my model with

model.compile(optimizer="adam", loss="categorical_crossentropy",metrics=[unweightedRecall,"accuracy"])   #model compilation with unweightedRecall metrics

train =model.fit_generator(image_gen.flow(train_X, train_label, batch_size=64),epochs=100,verbose=1,validation_data=(valid_X, valid_label),class_weight=class_weights,callbacks=[metrics],steps_per_epoch=len(train_X)/64)  #run the model

VALIDATION macro recall differ from the 2 different code.

i.e (look val_unweightedRecall and val_recall)

Epoch 10/100
19/18 [===============================] - 13s 703ms/step - loss: 1.5167 - unweightedRecall: 0.1269 - acc: 0.5295 - val_loss: 1.5339 - val_unweightedRecall: 0.1272 - val_acc: 0.5519
 — val_f1: 0.168833 — val_precision: 0.197502 — val_recall 0.15636

Why do i have different value on my macro validation recall with the two different code?

Bonus question: For people who have already tryied this, is it really worth to use custom loss based on our interested metric (recall for example) or categorical cross entropy with weights produce same result?

回答1:

let me answer both question but in the opposite order:

You can't use Recall as a base for a custom loss: It is not convex! If you do not fully understand why Recall or precision or f1 can't be used as a loss, please take the time to see the role of the loss (it is afterall a huge parameter in your model).

Indeed, the round is intended for a binary problem. As they say, if it's not you then it's the other. But in your case it's wrong. Let's go throw the code:

val_predict = (np.asarray(self.model.predict(self.validation_data[0]))).round()

from the inside out, he take the data (self.validation_data[0;]) and predict a number (1 neuron as output). As such he compute the probability of being a 1. If this probability is over 0.5, then the round transforms it into a 1, if it is under, it transforms it to a 0. As you can see, it is wrong for you. In some case you won't predict any class. Following this mistake, the rest is also wrong.

Now, the solution. You want to compute the mean Recall at every step. by the way, "but it only works on validation set". yes that is intended, you use the validation to validate the model, not the train, else it is cheating.

so Recall is equal to true positive over all positives. Let's do that!

def recall(y_true, y_pred):
     recall = 0
     pred = K.argmax(y_pred)
     true = K.argmax(y_true)
     for i in range(16):
         p = K.cast(K.equal(pred,i),'int32')
         t = K.cast(K.equal(true,i),'int32')
         # Compute the true positive
         common = K.sum(K.dot(K.reshape(t,(1,-1)),K.reshape(p,(-1,1))))
         # divide by all positives in t
         recall += common/ (K.sum(t) + K.epsilon)
     return recall/16

This gives you the mean recall for all classes. you could print the value for every class.

Tell me if you have any question!

for an implementation of the binary Recall, see this question from which the code is adapted.

来源：https://stackoverflow.com/questions/56261014/macro-metrics-recall-f1-for-multiclass-cnn

标签

python

tensorflow

machine-learning

keras