问题
I'm working in a very unbalanced classification problem, and I'm using AUPRC as metric in caret. I'm getting very differents results for the test set in AUPRC from caret and in AUPRC from package PRROC.
In order to make it easy, the reproducible example uses PimaIndiansDiabetes dataset from package mlbench:
rm(list=ls())
library(caret)
library(mlbench)
library(PRROC)
#load data, renaming it to 'datos'
data(PimaIndiansDiabetes)
datos=PimaIndiansDiabetes[,1:9]
# training and test
set.seed(998)
inTraining <- createDataPartition(datos[,9], p = .8, list = FALSE)
training <-datos[ inTraining,]
testing <- datos[ -inTraining,]
#training
control=trainControl(method = "cv",summaryFunction = prSummary,
classProbs = TRUE)
set.seed(998)
rf.tune <-train(training[,1:8],training[,9],method ="rf",
trControl=control,metric="AUC")
#evaluating AUPRC in test set
matriz=cbind(testing[,9],predict(rf.tune,testing[,1:8],type="prob"),
predict(rf.tune,testing[,1:8]))
names(matriz)=c("obs",levels(testing[,9]),"pred")
prSummary(matriz,levels(testing[,9]))
#calculating AUPRC through pr.curve
#checking positive class
confusionMatrix(predict(rf.tune,testing[,1:8]),testing[,9],
mode = "prec_recall")#'Positive' Class : neg
#preparing data for pr.curve
indice_POS=which(testing[,9]=="neg")
indice_NEG=which(testing[,9]=="pos")
#the classification scores of only the data points belonging to the
#positive class
clas_score_POS=predict(rf.tune,testing[,1:8],type="prob")[indice_POS,1]
#the classification scores of only the data points belonging to the
#negative class
clas_score_NEG=predict(rf.tune,testing[,1:8],type="prob")[indice_NEG,2]
pr.curve(clas_score_POS,clas_score_NEG)
Value from PRROC is 0.9053432 and from caret prSummary is 0.8714607. In my unbalanced case, the differences are broader(AUPRC= 0.1688446 with SMOTE resampling -via control$sampling <- "smote"
- and 0.01429 with PRROC.)
Is this because of the different methods to calculate AUPRC in those packages or I'm doing something wrong?
UPDATED: I can't find bugs in my code. After missuse answer, I'd like to make some remarks:
When you do prSummary(matriz,levels(testing[,9]))
you got
AUC Precision Recall F
0.8714607 0.7894737 0.9000000 0.8411215
which is consistent with
confusionMatrix(predict(rf.tune,testing[,1:8]),testing[,9],mode = "prec_recall")
Confusion Matrix and Statistics
Reference
Prediction neg pos
neg 90 23
pos 10 30
Accuracy : 0.7843
95% CI : (0.7106, 0.8466)
No Information Rate : 0.6536
P-Value [Acc > NIR] : 0.0003018
Kappa : 0.4945
Mcnemar's Test P-Value : 0.0367139
Precision : 0.7965
Recall : 0.9000
F1 : 0.8451
Prevalence : 0.6536
Detection Rate : 0.5882
Detection Prevalence : 0.7386
Balanced Accuracy : 0.7330
'Positive' Class : neg
And with:
> MLmetrics::PRAUC(y_pred = matriz$neg, y_true = ifelse(matriz$obs == "neg", 1, 0))
[1] 0.8714607
As you can see in the last line, the 'Positive' class is 'neg', and I think that missuse is considering the positive class as 'pos', so we have different metrics. Moreover, when you print the trained rf, the results are also consistent with an expected AUC~0.87:
> rf.tune
Random Forest
615 samples
8 predictor
2 classes: 'neg', 'pos'
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 554, 553, 553, 554, 554, 554, ...
Resampling results across tuning parameters:
mtry AUC Precision Recall F
2 0.8794965 0.7958683 0.8525 0.8214760
5 0.8786427 0.8048463 0.8325 0.8163032
8 0.8528028 0.8110820 0.8325 0.8192225
I'm not worried about the difference 0.87caret-0.9PRROC in this case, but I'm very worried about 0.1688446 caret/ 0.01429 PRROC in the unbalanced case. Might this be because the numeric divergence under different implementations is strengthened in the unbalanced case? And if there are a numerical difference in the implementations, how's that they are identical 0.8714607
in the test set?
回答1:
I trust you are making several mistakes in you code.
First of all caret::prSummary
uses MLmetrics::PRAUC
to compute the AUPRC. It should be defined like this:
MLmetrics::PRAUC(y_pred = matriz$pos, y_true = ifelse(matriz$obs == "pos", 1, 0))
#output
0.7066323
using the positive class probability and the numeric 0/1 vector of true classes (1 for positive)
The same result is obtained by using:
caret::prSummary(matriz, levels(testing[,9])[2])
MLmetrics::PRAUC
uses ROCR::prediction
to construct the curve:
pred_obj <- ROCR::prediction(matriz$pos, ifelse(matriz$obs == "pos", 1, 0))
perf_obj <- ROCR::performance(pred_obj, measure = "prec",
x.measure = "rec")
and the curve looks like:
ROCR::plot(perf_obj, ylim = c(0,1))
when one uses PRROC::pr.curve
there are several ways to define the inputs. One is to provide a vector of probabilities for the positive class for the positive observations, and a vector of probabilities for the positive class for the negative observations:
preds <- predict(rf.tune,
testing[,1:8],
type="prob")[,2] #prob of positive class
preds_pos <- preds[testing[,9]=="pos"] #preds for true positive class
preds_neg <- preds[testing[,9]=="neg"] #preds for true negative class
PRROC::pr.curve(preds_pos, preds_neg)
#truncated output
0.7254904
these two numbers (obtained by PRROC::pr.curve
and MLmetrics::PRAUC
) do not match
however the curve
plot(PRROC::pr.curve(preds_pos, preds_neg, curve = TRUE))
looks just like the above one obtained using ROCR::plot
.
To check:
res <- PRROC::pr.curve(preds_pos, preds_neg, curve = TRUE)
ROCR::plot(perf_obj, ylim = c(0,1), lty = 2, lwd = 2)
lines(res$curve[,1], res$curve[,2], col = "red", lty = 5)
they are the same. Therefore the difference in the obtained area is due to different implementations in the mentioned packages.
These implementations can be checked by looking at the source for:
MLmetrics:::Area_Under_Curve #this one looks pretty straight forward
PRROC:::compute.pr #haven't the time to study this one but if I had to bet I'd say this one is more accurate for step like curves.
来源:https://stackoverflow.com/questions/53301729/difference-between-auprc-in-caret-and-prroc