Calculate AUC in R?

前端 未结 10 1194
感动是毒
感动是毒 2020-12-07 09:45

Given a vector of scores and a vector of actual class labels, how do you calculate a single-number AUC metric for a binary classifier in the R language or in simple English?

相关标签:
10条回答
  • 2020-12-07 10:18

    Combining code from ISL 9.6.3 ROC Curves, along with @J. Won.'s answer to this question and a few more places, the following plots the ROC curve and prints the AUC in the bottom right on the plot.

    Below probs is a numeric vector of predicted probabilities for binary classification and test$label contains the true labels of the test data.

    require(ROCR)
    require(pROC)
    
    rocplot <- function(pred, truth, ...) {
      predob = prediction(pred, truth)
      perf = performance(predob, "tpr", "fpr")
      plot(perf, ...)
      area <- auc(truth, pred)
      area <- format(round(area, 4), nsmall = 4)
      text(x=0.8, y=0.1, labels = paste("AUC =", area))
    
      # the reference x=y line
      segments(x0=0, y0=0, x1=1, y1=1, col="gray", lty=2)
    }
    
    rocplot(probs, test$label, col="blue")
    

    This gives a plot like this:

    0 讨论(0)
  • 2020-12-07 10:23

    Along the lines of erik's response, you should also be able to calculate the ROC directly by comparing all possible pairs of values from pos.scores and neg.scores:

    score.pairs <- merge(pos.scores, neg.scores)
    names(score.pairs) <- c("pos.score", "neg.score")
    sum(score.pairs$pos.score > score.pairs$neg.score) / nrow(score.pairs)
    

    Certainly less efficient than the sample approach or the pROC::auc, but more stable than the former and requiring less installation than the latter.

    Related: when I tried this it gave similar results to pROC's value, but not exactly the same (off by 0.02 or so); the result was closer to the sample approach with very high N. If anyone has ideas why that might be I'd be interested.

    0 讨论(0)
  • 2020-12-07 10:26

    I usually use the function ROC from the DiagnosisMed package. I like the graph it produces. AUC is returned along with it's confidence interval and it is also mentioned on the graph.

    ROC(classLabels,scores,Full=TRUE)
    
    0 讨论(0)
  • 2020-12-07 10:30

    Without any additional packages:

    true_Y = c(1,1,1,1,2,1,2,1,2,2)
    probs = c(1,0.999,0.999,0.973,0.568,0.421,0.382,0.377,0.146,0.11)
    
    getROC_AUC = function(probs, true_Y){
        probsSort = sort(probs, decreasing = TRUE, index.return = TRUE)
        val = unlist(probsSort$x)
        idx = unlist(probsSort$ix)  
    
        roc_y = true_Y[idx];
        stack_x = cumsum(roc_y == 2)/sum(roc_y == 2)
        stack_y = cumsum(roc_y == 1)/sum(roc_y == 1)    
    
        auc = sum((stack_x[2:length(roc_y)]-stack_x[1:length(roc_y)-1])*stack_y[2:length(roc_y)])
        return(list(stack_x=stack_x, stack_y=stack_y, auc=auc))
    }
    
    aList = getROC_AUC(probs, true_Y) 
    
    stack_x = unlist(aList$stack_x)
    stack_y = unlist(aList$stack_y)
    auc = unlist(aList$auc)
    
    plot(stack_x, stack_y, type = "l", col = "blue", xlab = "False Positive Rate", ylab = "True Positive Rate", main = "ROC")
    axis(1, seq(0.0,1.0,0.1))
    axis(2, seq(0.0,1.0,0.1))
    abline(h=seq(0.0,1.0,0.1), v=seq(0.0,1.0,0.1), col="gray", lty=3)
    legend(0.7, 0.3, sprintf("%3.3f",auc), lty=c(1,1), lwd=c(2.5,2.5), col="blue", title = "AUC")
    

    enter image description here

    0 讨论(0)
提交回复
热议问题