How to use prediction score in creating ROC curve with Scikit-Learn

為{幸葍}努か 提交于 2019-12-14 01:13:52

问题


I have the following code:

from sklearn.metrics import roc_curve, auc

actual      = [1,1,1,0,0,1]
prediction_scores = [0.9,0.9,0.9,0.1,0.1,0.1]
false_positive_rate, true_positive_rate, thresholds = roc_curve(actual, prediction_scores, pos_label=1)
roc_auc = auc(false_positive_rate, true_positive_rate)
roc_auc
# 0.875

In this example the interpretation of prediction_scores is straightforward namely, the higher the score the more confident the prediction is.

Now I have another set of prediction prediction scores. It is non-fractional, and the interpretation is the reverse. Meaning the lower the score more confident the prediction is.

prediction_scores_v2 = [10.3,10.3,10.2,10.5,2000.34,2000.34]
# so this is equivalent 

My question is: how can I scale that in prediction_scores_v2 so that it gives similar AUC score like the first one?

To put it another way, Scikit's ROC_CURVE requires the y_score to be probability estimates of the positive class. How can I treat the value if the y_score I have is probability estimates of the wrong class?


回答1:


For AUC, you really only care about the order of your predictions. So as long as that is true, you can just get your predictions into a format that AUC will accept.

You'll want to divide by the max to get your predictions to be between 0 and 1, and then subtract from 1 since lower is better in your case:

max_pred = max(prediction_scores_v2)
prediction_scores_v2[:] = (1-x/max_pred for x in prediction_scores_v2)

false_positive_rate, true_positive_rate, thresholds = roc_curve(actual, prediction_scores_v2, pos_label=1)
roc_auc = auc(false_positive_rate, true_positive_rate)
# 0.8125



回答2:


How can I treat the value if the y_score I have is probability estimates of the wrong class?

This is a really cheap shot, but have you considered reversing the original class list, as in

actual      = [abs(x-1) for x in actual]

Then, you could still apply the normalization @Tchotchke proposed.

Still, in the end, @BrenBarn seems right. If possible, have an in-depth look at how these values are created and/or used in the other prediction tool.



来源:https://stackoverflow.com/questions/37202548/how-to-use-prediction-score-in-creating-roc-curve-with-scikit-learn

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!