问题
I have the following code:
from sklearn.metrics import roc_curve, auc
actual = [1,1,1,0,0,1]
prediction_scores = [0.9,0.9,0.9,0.1,0.1,0.1]
false_positive_rate, true_positive_rate, thresholds = roc_curve(actual, prediction_scores, pos_label=1)
roc_auc = auc(false_positive_rate, true_positive_rate)
roc_auc
# 0.875
In this example the interpretation of prediction_scores
is straightforward namely, the higher the score the more confident the prediction is.
Now I have another set of prediction prediction scores. It is non-fractional, and the interpretation is the reverse. Meaning the lower the score more confident the prediction is.
prediction_scores_v2 = [10.3,10.3,10.2,10.5,2000.34,2000.34]
# so this is equivalent
My question is: how can I scale that in prediction_scores_v2
so that it gives
similar AUC score like the first one?
To put it another way, Scikit's ROC_CURVE requires the y_score
to be probability estimates of the positive class. How can I treat the value if the y_score
I have is probability estimates of the wrong class?
回答1:
For AUC, you really only care about the order of your predictions. So as long as that is true, you can just get your predictions into a format that AUC will accept.
You'll want to divide by the max to get your predictions to be between 0 and 1, and then subtract from 1 since lower is better in your case:
max_pred = max(prediction_scores_v2)
prediction_scores_v2[:] = (1-x/max_pred for x in prediction_scores_v2)
false_positive_rate, true_positive_rate, thresholds = roc_curve(actual, prediction_scores_v2, pos_label=1)
roc_auc = auc(false_positive_rate, true_positive_rate)
# 0.8125
回答2:
How can I treat the value if the
y_score
I have is probability estimates of the wrong class?
This is a really cheap shot, but have you considered reversing the original class list, as in
actual = [abs(x-1) for x in actual]
Then, you could still apply the normalization @Tchotchke proposed.
Still, in the end, @BrenBarn seems right. If possible, have an in-depth look at how these values are created and/or used in the other prediction tool.
来源:https://stackoverflow.com/questions/37202548/how-to-use-prediction-score-in-creating-roc-curve-with-scikit-learn