Calculate AUC in R?

前端未结

关注

 10  1193

Given a vector of scores and a vector of actual class labels, how do you calculate a single-number AUC metric for a binary classifier in the R language or in simple English?

相关标签:

10条回答

广开言路

2020-12-07 10:07
As mentioned by others, you can compute the AUC using the ROCR package. With the ROCR package you can also plot the ROC curve, lift curve and other model selection measures.

You can compute the AUC directly without using any package by using the fact that the AUC is equal to the probability that a true positive is scored greater than a true negative.

For example, if pos.scores is a vector containing a score of the positive examples, and neg.scores is a vector containing the negative examples then the AUC is approximated by:
```
> mean(sample(pos.scores,1000,replace=T) > sample(neg.scores,1000,replace=T))
[1] 0.7261
```
will give an approximation of the AUC. You can also estimate the variance of the AUC by bootstrapping:
```
> aucs = replicate(1000,mean(sample(pos.scores,1000,replace=T) > sample(neg.scores,1000,replace=T)))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

挽巷

2020-12-07 10:08

Currently top voted answer is incorrect, because it disregards ties. When positive and negative scores are equal, then AUC should be 0.5. Below is corrected example.

computeAUC <- function(pos.scores, neg.scores, n_sample=100000) {
  # Args:
  #   pos.scores: scores of positive observations
  #   neg.scores: scores of negative observations
  #   n_samples : number of samples to approximate AUC

  pos.sample <- sample(pos.scores, n_sample, replace=T)
  neg.sample <- sample(neg.scores, n_sample, replace=T)
  mean(1.0*(pos.sample > neg.sample) + 0.5*(pos.sample==neg.sample))
}

0 讨论(0)

北恋

2020-12-07 10:12
With the package pROC you can use the function auc() like this example from the help page:
```
> data(aSAH)
> 
> # Syntax (response, predictor):
> auc(aSAH$outcome, aSAH$s100b)
Area under the curve: 0.7314
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

长发绾君心

2020-12-07 10:12

I found some of the solutions here to be slow and/or confusing (and some of them don't handle ties correctly) so I wrote my own data.table based function auc_roc() in my R package mltools.

library(data.table)
library(mltools)

preds <- c(.1, .3, .3, .9)
actuals <- c(0, 0, 1, 1)

auc_roc(preds, actuals)  # 0.875

auc_roc(preds, actuals, returnDT=TRUE)
   Pred CountFalse CountTrue CumulativeFPR CumulativeTPR AdditionalArea CumulativeArea
1:  0.9          0         1           0.0           0.5          0.000          0.000
2:  0.3          1         1           0.5           1.0          0.375          0.375
3:  0.1          1         0           1.0           1.0          0.500          0.875

0 讨论(0)

攒了一身酷

2020-12-07 10:16

You can learn more about AUROC in this blog post by Miron Kursa:

https://mbq.me/blog/augh-roc/

He provides a fast function for AUROC:

# By Miron Kursa https://mbq.me
auroc <- function(score, bool) {
  n1 <- sum(!bool)
  n2 <- sum(bool)
  U  <- sum(rank(score)[!bool]) - n1 * (n1 + 1) / 2
  return(1 - U / n1 / n2)
}

Let's test it:

set.seed(42)
score <- rnorm(1e3)
bool  <- sample(c(TRUE, FALSE), 1e3, replace = TRUE)

pROC::auc(bool, score)
mltools::auc_roc(score, bool)
ROCR::performance(ROCR::prediction(score, bool), "auc")@y.values[[1]]
auroc(score, bool)

0.51371668847094
0.51371668847094
0.51371668847094
0.51371668847094

auroc() is 100 times faster than pROC::auc() and computeAUC().

auroc() is 10 times faster than mltools::auc_roc() and ROCR::performance().

print(microbenchmark(
  pROC::auc(bool, score),
  computeAUC(score[bool], score[!bool]),
  mltools::auc_roc(score, bool),
  ROCR::performance(ROCR::prediction(score, bool), "auc")@y.values,
  auroc(score, bool)
))

Unit: microseconds
                                                             expr       min
                                           pROC::auc(bool, score) 21000.146
                            computeAUC(score[bool], score[!bool]) 11878.605
                                    mltools::auc_roc(score, bool)  5750.651
 ROCR::performance(ROCR::prediction(score, bool), "auc")@y.values  2899.573
                                               auroc(score, bool)   236.531
         lq       mean     median        uq        max neval  cld
 22005.3350 23738.3447 22206.5730 22710.853  32628.347   100    d
 12323.0305 16173.0645 12378.5540 12624.981 233701.511   100   c 
  6186.0245  6495.5158  6325.3955  6573.993  14698.244   100  b  
  3019.6310  3300.1961  3068.0240  3237.534  11995.667   100 ab  
   245.4755   253.1109   251.8505   257.578    300.506   100 a

0 讨论(0)

执笔经年

2020-12-07 10:17
The ROCR package will calculate the AUC among other statistics:
```
auc.tmp <- performance(pred,"auc"); auc <- as.numeric(auc.tmp@y.values)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页