I am trying to calculate roc_auc_score
, but I am getting following error.
\"ValueError: Data is not binary and pos_label is not specified\"
We have problem in
y_true=np.array(['0', '1', '0', '0', '1', '1', '1', '1', '1'])
Convert values of y_true to Boolean
y_true= '1' <= y_true
print(y_true) # [False True False False True True True True True]
You only need to change y_true
so it looks like this:
y_true=np.array([0, 1, 0, 0, 1, 1, 1, 1, 1])
Explanation:
If you take a look to what roc_auc_score
functions does in https://github.com/scikit-learn/scikit-learn/blob/0.15.X/sklearn/metrics/metrics.py you will see that y_true
is evaluated as follows:
classes = np.unique(y_true)
if (pos_label is None and not (np.all(classes == [0, 1]) or
np.all(classes == [-1, 1]) or
np.all(classes == [0]) or
np.all(classes == [-1]) or
np.all(classes == [1]))):
raise ValueError("Data is not binary and pos_label is not specified")
At the moment of the execution pos_label
is None
, but as long as your are defining y_true
as an array of characters the np.all
are always false
and as all of them are negated then the if condition is true
and the exception is raised.