问题
I am using an rpart
classifier in R. The question is - I would want to test the trained classifier on a test data. This is fine - I can use the predict.rpart
function.
But I also want to calculate precision, recall and F1 score.
My question is - do I have to write functions for those myself, or is there any function in R or any of CRAN libraries for that?
回答1:
The ROCR library calculates all these and more (see also http://rocr.bioinf.mpi-sb.mpg.de):
library (ROCR);
...
y <- ... # logical array of positive / negative cases
predictions <- ... # array of predictions
pred <- prediction(predictions, y);
# Recall-Precision curve
RP.perf <- performance(pred, "prec", "rec");
plot (RP.perf);
# ROC curve
ROC.perf <- performance(pred, "tpr", "fpr");
plot (ROC.perf);
# ROC area under the curve
auc.tmp <- performance(pred,"auc");
auc <- as.numeric(auc.tmp@y.values)
...
回答2:
using the caret package:
library(caret)
y <- ... # factor of positive / negative cases
predictions <- ... # factor of predictions
precision <- posPredValue(predictions, y, positive="1")
recall <- sensitivity(predictions, y, positive="1")
F1 <- (2 * precision * recall) / (precision + recall)
A generic function that works for binary and multi-class classification without using any package is:
f1_score <- function(predicted, expected, positive.class="1") {
predicted <- factor(as.character(predicted), levels=unique(as.character(expected)))
expected <- as.factor(expected)
cm = as.matrix(table(expected, predicted))
precision <- diag(cm) / colSums(cm)
recall <- diag(cm) / rowSums(cm)
f1 <- ifelse(precision + recall == 0, 0, 2 * precision * recall / (precision + recall))
#Assuming that F1 is zero when it's not possible compute it
f1[is.na(f1)] <- 0
#Binary F1 or Multi-class macro-averaged F1
ifelse(nlevels(expected) == 2, f1[positive.class], mean(f1))
}
Some comments about the function:
- It's assumed that an F1 = NA is zero
positive.class
is used only in binary f1- for multi-class problems, the macro-averaged F1 is computed
- If
predicted
andexpected
had different levels,predicted
will receive theexpected
levels
回答3:
I noticed the comment about F1 score being needed for binary classes. I suspect that it usually is. But a while ago I wrote this in which I was doing classification into several groups denoted by number. This may be of use to you...
calcF1Scores=function(act,prd){
#treats the vectors like classes
#act and prd must be whole numbers
df=data.frame(act=act,prd=prd);
scores=list();
for(i in seq(min(act),max(act))){
tp=nrow(df[df$prd==i & df$act==i,]);
fp=nrow(df[df$prd==i & df$act!=i,]);
fn=nrow(df[df$prd!=i & df$act==i,]);
f1=(2*tp)/(2*tp+fp+fn)
scores[[i]]=f1;
}
print(scores)
return(scores);
}
print(mean(unlist(calcF1Scores(c(1,1,3,4,5),c(1,2,3,4,5)))))
print(mean(unlist(calcF1Scores(c(1,2,3,4,5),c(1,2,3,4,5)))))
回答4:
confusionMatrix() from caret package can be used along with a proper optional field "Positive" specifying which factor should be taken as positive factor.
confusionMatrix(predicted, Funded, mode = "prec_recall", positive="1")
This code will also give additional values such as F-statistic, Accuracy, etc.
回答5:
Just to update this as I came across this thread now, the confusionMatrix
function in caret
computes all of these things for you automatically.
cm <- confusionMatrix(prediction, reference = test_set$label)
# extract F1 score for all classes
cm[["byClass"]][ , "F1"] #for multiclass classification problems
You can substite "F1" above for any of the following to extract the relevant values as well:
"Sensitivity", "Specificity", "Pos Pred Value", "Neg Pred Value", "Precision", "Recall", "F1", "Prevalence", "Detection", "Rate", "Detection Prevalence", "Balanced Accuracy"
I think this behaves slightly differently when you're only doing a binary classifcation problem, but in both cases, all of these values are computed for you when you look inside the confusionMatrix object, under $byClass
回答6:
We can simply get F1 value from caret's confusionMatrix function
result <- confusionMatrix(Prediction, Lable)
# View confusion matrix overall
result
# F1 value
result$byClass[7]
回答7:
You can also use the confusionMatrix()
provided by caret
package. The output includes,between others, Sensitivity (also known as recall) and Pos Pred Value(also known as precision). Then F1 can be easily computed, as stated above, as:
F1 <- (2 * precision * recall) / (precision + recall)
来源:https://stackoverflow.com/questions/8499361/easy-way-of-counting-precision-recall-and-f1-score-in-r