I have some models, using ROCR
package on a vector of the predicted class percentages, I have a performance object. Plotting the performance object with the spe
2 solutions based on the ROCR
and pROC
packages:
threshold1 <- function(predict, response) {
perf <- ROCR::performance(ROCR::prediction(predict, response), "sens", "spec")
df <- data.frame(cut = perf@alpha.values[[1]], sens = perf@x.values[[1]], spec = perf@y.values[[1]])
df[which.max(df$sens + df$spec), "cut"]
}
threshold2 <- function(predict, response) {
r <- pROC::roc(response, predict)
r$thresholds[which.max(r$sensitivities + r$specificities)]
}
data(ROCR.simple, package = "ROCR")
threshold1(ROCR.simple$predictions, ROCR.simple$labels)
#> [1] 0.5014893
threshold2(ROCR.simple$predictions, ROCR.simple$labels)
#> [1] 0.5006387
See also OptimalCutpoints
package which provides many algorithms to find an optimal thresholds.
This is why str
is my favorite R function:
library(ROCR)
data(ROCR.simple)
pred <- prediction( ROCR.simple$predictions, ROCR.simple$labels)
perf <- performance(pred,"tpr","fpr")
plot(perf)
> str(perf)
Formal class 'performance' [package "ROCR"] with 6 slots
..@ x.name : chr "False positive rate"
..@ y.name : chr "True positive rate"
..@ alpha.name : chr "Cutoff"
..@ x.values :List of 1
.. ..$ : num [1:201] 0 0 0 0 0.00935 ...
..@ y.values :List of 1
.. ..$ : num [1:201] 0 0.0108 0.0215 0.0323 0.0323 ...
..@ alpha.values:List of 1
.. ..$ : num [1:201] Inf 0.991 0.985 0.985 0.983 ...
Ahah! It's an S4 class, so we can use @
to access the slots. Here's how you make a data.frame
:
cutoffs <- data.frame(cut=perf@alpha.values[[1]], fpr=perf@x.values[[1]],
tpr=perf@y.values[[1]])
> head(cutoffs)
cut fpr tpr
1 Inf 0.000000000 0.00000000
2 0.9910964 0.000000000 0.01075269
3 0.9846673 0.000000000 0.02150538
4 0.9845992 0.000000000 0.03225806
5 0.9834944 0.009345794 0.03225806
6 0.9706413 0.009345794 0.04301075
If you have an fpr threshold you want to hit, you can subset this data.frame
to find maximum tpr below this fpr threshold:
cutoffs <- cutoffs[order(cutoffs$tpr, decreasing=TRUE),]
> head(subset(cutoffs, fpr < 0.2))
cut fpr tpr
96 0.5014893 0.1495327 0.8494624
97 0.4997881 0.1588785 0.8494624
98 0.4965132 0.1682243 0.8494624
99 0.4925969 0.1775701 0.8494624
100 0.4917356 0.1869159 0.8494624
101 0.4901199 0.1962617 0.8494624
Package pROC
includes function coords
for calculating best threshold:
library(pROC)
my_roc <- roc(my_response, my_predictor)
coords(my_roc, "best", ret = "threshold")