Difference between AUPRC in caret and PRROC

问题

I'm working in a very unbalanced classification problem, and I'm using AUPRC as metric in caret. I'm getting very differents results for the test set in AUPRC from caret and in AUPRC from package PRROC.

In order to make it easy, the reproducible example uses PimaIndiansDiabetes dataset from package mlbench:

rm(list=ls())
library(caret)
library(mlbench)
library(PRROC)

#load data, renaming it to 'datos'
data(PimaIndiansDiabetes)
datos=PimaIndiansDiabetes[,1:9]

# training and test
set.seed(998)
inTraining <- createDataPartition(datos[,9], p = .8, list = FALSE)
training <-datos[ inTraining,]
testing <- datos[ -inTraining,]

#training

 control=trainControl(method = "cv",summaryFunction = prSummary,
 classProbs = TRUE)
 set.seed(998)
 rf.tune <-train(training[,1:8],training[,9],method ="rf",   
 trControl=control,metric="AUC")

#evaluating AUPRC in test set

 matriz=cbind(testing[,9],predict(rf.tune,testing[,1:8],type="prob"),
 predict(rf.tune,testing[,1:8]))
 names(matriz)=c("obs",levels(testing[,9]),"pred")
 prSummary(matriz,levels(testing[,9]))


 #calculating AUPRC through pr.curve

#checking positive class
 confusionMatrix(predict(rf.tune,testing[,1:8]),testing[,9],
 mode  = "prec_recall")#'Positive' Class : neg 

#preparing data for pr.curve
indice_POS=which(testing[,9]=="neg")
indice_NEG=which(testing[,9]=="pos")

#the classification scores of  only the data points belonging to the 
#positive class
 clas_score_POS=predict(rf.tune,testing[,1:8],type="prob")[indice_POS,1]

 #the classification scores of  only the data points belonging to the 
 #negative class
 clas_score_NEG=predict(rf.tune,testing[,1:8],type="prob")[indice_NEG,2]

 pr.curve(clas_score_POS,clas_score_NEG)

Value from PRROC is 0.9053432 and from caret prSummary is 0.8714607. In my unbalanced case, the differences are broader(AUPRC= 0.1688446 with SMOTE resampling -via control$sampling <- "smote"- and 0.01429 with PRROC.)

Is this because of the different methods to calculate AUPRC in those packages or I'm doing something wrong?

UPDATED: I can't find bugs in my code. After missuse answer, I'd like to make some remarks:

When you do prSummary(matriz,levels(testing[,9])) you got

 AUC      Precision    Recall         F 
0.8714607 0.7894737 0.9000000 0.8411215

which is consistent with

confusionMatrix(predict(rf.tune,testing[,1:8]),testing[,9],mode  = "prec_recall")
Confusion Matrix and Statistics

          Reference
Prediction neg pos
       neg  90  23
       pos  10  30

               Accuracy : 0.7843          
                 95% CI : (0.7106, 0.8466)
    No Information Rate : 0.6536          
    P-Value [Acc > NIR] : 0.0003018       

                  Kappa : 0.4945          
 Mcnemar's Test P-Value : 0.0367139       

              Precision : 0.7965          
                 Recall : 0.9000          
                     F1 : 0.8451          
             Prevalence : 0.6536          
         Detection Rate : 0.5882          
   Detection Prevalence : 0.7386          
      Balanced Accuracy : 0.7330          

       'Positive' Class : neg

And with:

> MLmetrics::PRAUC(y_pred = matriz$neg, y_true = ifelse(matriz$obs == "neg", 1, 0))
[1] 0.8714607

As you can see in the last line, the 'Positive' class is 'neg', and I think that missuse is considering the positive class as 'pos', so we have different metrics. Moreover, when you print the trained rf, the results are also consistent with an expected AUC~0.87:

> rf.tune
Random Forest 

615 samples
  8 predictor
  2 classes: 'neg', 'pos' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 554, 553, 553, 554, 554, 554, ... 
Resampling results across tuning parameters:

  mtry  AUC        Precision  Recall  F        
  2     0.8794965  0.7958683  0.8525  0.8214760
  5     0.8786427  0.8048463  0.8325  0.8163032
  8     0.8528028  0.8110820  0.8325  0.8192225

I'm not worried about the difference 0.87caret-0.9PRROC in this case, but I'm very worried about 0.1688446 caret/ 0.01429 PRROC in the unbalanced case. Might this be because the numeric divergence under different implementations is strengthened in the unbalanced case? And if there are a numerical difference in the implementations, how's that they are identical 0.8714607 in the test set?

回答1:

I trust you are making several mistakes in you code.

First of all caret::prSummary uses MLmetrics::PRAUC to compute the AUPRC. It should be defined like this:

MLmetrics::PRAUC(y_pred = matriz$pos, y_true = ifelse(matriz$obs == "pos", 1, 0))

#output
0.7066323

using the positive class probability and the numeric 0/1 vector of true classes (1 for positive)

The same result is obtained by using:

caret::prSummary(matriz, levels(testing[,9])[2])

MLmetrics::PRAUC uses ROCR::prediction to construct the curve:

pred_obj <- ROCR::prediction(matriz$pos, ifelse(matriz$obs == "pos", 1, 0))
perf_obj <- ROCR::performance(pred_obj, measure = "prec", 
                              x.measure = "rec")

and the curve looks like:

ROCR::plot(perf_obj, ylim = c(0,1))

when one uses PRROC::pr.curve there are several ways to define the inputs. One is to provide a vector of probabilities for the positive class for the positive observations, and a vector of probabilities for the positive class for the negative observations:

preds <- predict(rf.tune,
                 testing[,1:8],
                 type="prob")[,2] #prob of positive class
preds_pos <- preds[testing[,9]=="pos"] #preds for true positive class
preds_neg <- preds[testing[,9]=="neg"] #preds for true negative class

PRROC::pr.curve(preds_pos, preds_neg)
#truncated output
0.7254904

these two numbers (obtained by PRROC::pr.curve and MLmetrics::PRAUC) do not match

however the curve

plot(PRROC::pr.curve(preds_pos, preds_neg, curve = TRUE))

looks just like the above one obtained using ROCR::plot.

To check:

res <- PRROC::pr.curve(preds_pos, preds_neg, curve = TRUE)

ROCR::plot(perf_obj, ylim = c(0,1), lty = 2, lwd = 2)
lines(res$curve[,1], res$curve[,2], col = "red", lty = 5)

they are the same. Therefore the difference in the obtained area is due to different implementations in the mentioned packages.

These implementations can be checked by looking at the source for:

MLmetrics:::Area_Under_Curve #this one looks pretty straight forward 
PRROC:::compute.pr #haven't the time to study this one but if I had to bet I'd say this one is more accurate for step like curves.

来源：https://stackoverflow.com/questions/53301729/difference-between-auprc-in-caret-and-prroc

标签

r-caret

precision-recall