Plotting a ROC curve in scikit yields only 3 points

 ̄綄美尐妖づ 提交于 2019-12-20 18:44:19

问题


TLDR: scikit's roc_curve function is only returning 3 points for a certain dataset. Why could this be, and how do we control how many points to get back?

I'm trying to draw a ROC curve, but consistently get a "ROC triangle".

lr = LogisticRegression(multi_class = 'multinomial', solver = 'newton-cg')
y = data['target'].values
X = data[['feature']].values

model = lr.fit(X,y)

# get probabilities for clf
probas_ = model.predict_log_proba(X)

Just to make sure the lengths are ok:

print len(y)
print len(probas_[:, 1])

Returns 13759 on both.

Then running:

false_pos_rate, true_pos_rate, thresholds = roc_curve(y, probas_[:, 1])
print false_pos_rate

returns [ 0. 0.28240129 1. ]

If I call threasholds, I get array([ 0.4822225 , -0.5177775 , -0.84595197]) (always only 3 points).

It is therefore no surprise that my ROC curve looks like a triangle.

What I cannot understand is why scikit's roc_curve is only returning 3 points. Help hugely appreciated.


回答1:


The number of points depend on the number of unique values in the input. Since the input vector has only 2 unique values, the function gives correct output.




回答2:


I had the same problem with a different example. The mistake I made was to input the outcomes for a given threshold and not the probabilities in the argument y_score of roc_curve. It also gives a plot with three points.



来源:https://stackoverflow.com/questions/30051284/plotting-a-roc-curve-in-scikit-yields-only-3-points

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!