How to use prediction score in creating ROC curve with Scikit-Learn

问题

I have the following code:

from sklearn.metrics import roc_curve, auc

actual      = [1,1,1,0,0,1]
prediction_scores = [0.9,0.9,0.9,0.1,0.1,0.1]
false_positive_rate, true_positive_rate, thresholds = roc_curve(actual, prediction_scores, pos_label=1)
roc_auc = auc(false_positive_rate, true_positive_rate)
roc_auc
# 0.875

In this example the interpretation of prediction_scores is straightforward namely, the higher the score the more confident the prediction is.

Now I have another set of prediction prediction scores. It is non-fractional, and the interpretation is the reverse. Meaning the lower the score more confident the prediction is.

prediction_scores_v2 = [10.3,10.3,10.2,10.5,2000.34,2000.34]
# so this is equivalent

My question is: how can I scale that in prediction_scores_v2 so that it gives similar AUC score like the first one?

To put it another way, Scikit's ROC_CURVE requires the y_score to be probability estimates of the positive class. How can I treat the value if the y_score I have is probability estimates of the wrong class?

回答1:

For AUC, you really only care about the order of your predictions. So as long as that is true, you can just get your predictions into a format that AUC will accept.

You'll want to divide by the max to get your predictions to be between 0 and 1, and then subtract from 1 since lower is better in your case:

max_pred = max(prediction_scores_v2)
prediction_scores_v2[:] = (1-x/max_pred for x in prediction_scores_v2)

false_positive_rate, true_positive_rate, thresholds = roc_curve(actual, prediction_scores_v2, pos_label=1)
roc_auc = auc(false_positive_rate, true_positive_rate)
# 0.8125

回答2:

How can I treat the value if the y_score I have is probability estimates of the wrong class?

This is a really cheap shot, but have you considered reversing the original class list, as in

actual      = [abs(x-1) for x in actual]

Then, you could still apply the normalization @Tchotchke proposed.

Still, in the end, @BrenBarn seems right. If possible, have an in-depth look at how these values are created and/or used in the other prediction tool.

来源：https://stackoverflow.com/questions/37202548/how-to-use-prediction-score-in-creating-roc-curve-with-scikit-learn

标签

python

machine-learning

scikit-learn

roc