TLDR: scikit\'s roc_curve
function is only returning 3 points for a certain dataset.
Why could this be, and how do we control how many points to get ba
I had the same problem with a different example. The mistake I made was to input the outcomes for a given threshold and not the probabilities in the argument y_score
of roc_curve
. It also gives a plot with three points but it is a mistake !
It's not necessary to get 1 point except (0,0) and (1,1). I'm using mushrooms dataset from kaggle for a binary classification problem. Procuring fpr and tpr from roc_curve, I'm getting 4 more points, though their value is more or less same.
fpr = {0, 0, 0.02290076, 0.0267176, 0.832061, 1}
tpr = {0, 0.0315361, 0.985758, 0.996948, 1, 1}
I'm not sure if we can consider this as 1 point because plotting the curve using this looks like the one shown in question.
I ran into same problem, and after reading the documentaion carefully I realized that the mistake is in:
probas_ = model.predict_log_proba(X)
Although, there were hints pointed by others by checking the uniqueness. It should be instead:
probas_ = model.decisions(X)
The number of points depend on the number of unique values in the input. Since the input vector has only 2 unique values, the function gives correct output.