Easy way of counting precision, recall and F1-score in R

北战南征 提交于 2019-12-03 04:25:03


I am using an rpart classifier in R. The question is - I would want to test the trained classifier on a test data. This is fine - I can use the predict.rpart function.

But I also want to calculate precision, recall and F1 score.

My question is - do I have to write functions for those myself, or is there any function in R or any of CRAN libraries for that?


The ROCR library calculates all these and more (see also http://rocr.bioinf.mpi-sb.mpg.de):

library (ROCR);

y <- ... # logical array of positive / negative cases
predictions <- ... # array of predictions

pred <- prediction(predictions, y);

# Recall-Precision curve             
RP.perf <- performance(pred, "prec", "rec");

plot (RP.perf);

# ROC curve
ROC.perf <- performance(pred, "tpr", "fpr");
plot (ROC.perf);

# ROC area under the curve
auc.tmp <- performance(pred,"auc");
auc <- as.numeric(auc.tmp@y.values)



using the caret package:


y <- ... # factor of positive / negative cases
predictions <- ... # factor of predictions

precision <- posPredValue(predictions, y, positive="1")
recall <- sensitivity(predictions, y, positive="1")

F1 <- (2 * precision * recall) / (precision + recall)

A generic function that works for binary and multi-class classification without using any package is:

f1_score <- function(predicted, expected, positive.class="1") {
    predicted <- factor(as.character(predicted), levels=unique(as.character(expected)))
    expected  <- as.factor(expected)
    cm = as.matrix(table(expected, predicted))

    precision <- diag(cm) / colSums(cm)
    recall <- diag(cm) / rowSums(cm)
    f1 <-  ifelse(precision + recall == 0, 0, 2 * precision * recall / (precision + recall))

    #Assuming that F1 is zero when it's not possible compute it
    f1[is.na(f1)] <- 0

    #Binary F1 or Multi-class macro-averaged F1
    ifelse(nlevels(expected) == 2, f1[positive.class], mean(f1))

Some comments about the function:

  • It's assumed that an F1 = NA is zero
  • positive.class is used only in binary f1
  • for multi-class problems, the macro-averaged F1 is computed
  • If predicted and expected had different levels, predicted will receive the expected levels


I noticed the comment about F1 score being needed for binary classes. I suspect that it usually is. But a while ago I wrote this in which I was doing classification into several groups denoted by number. This may be of use to you...

  #treats the vectors like classes
  #act and prd must be whole numbers
  for(i in seq(min(act),max(act))){
    tp=nrow(df[df$prd==i & df$act==i,]);        
    fp=nrow(df[df$prd==i & df$act!=i,]);
    fn=nrow(df[df$prd!=i & df$act==i,]);



confusionMatrix() from caret package can be used along with a proper optional field "Positive" specifying which factor should be taken as positive factor.

confusionMatrix(predicted, Funded, mode = "prec_recall", positive="1")

This code will also give additional values such as F-statistic, Accuracy, etc.


Just to update this as I came across this thread now, the confusionMatrix function in caretcomputes all of these things for you automatically.

cm <- confusionMatrix(prediction, reference = test_set$label)

# extract F1 score for all classes
cm[["byClass"]][ , "F1"] #for multiclass classification problems

You can substite "F1" above for any of the following to extract the relevant values as well:

"Sensitivity", "Specificity", "Pos Pred Value", "Neg Pred Value", "Precision", "Recall", "F1", "Prevalence", "Detection", "Rate", "Detection Prevalence", "Balanced Accuracy"

I think this behaves slightly differently when you're only doing a binary classifcation problem, but in both cases, all of these values are computed for you when you look inside the confusionMatrix object, under $byClass


