问题
I am training a RandomForestClassifier
(sklearn) to predict credit card fraud. When I then test the model and check the rocauc score i get different values when I use roc_auc_score
and plot_roc_curve
. roc_auc_score gives me around 0.89 and the plot_curve calculates AUC to 0.96 why is that?
The labels are all 0 and 1 as well as the predictions are 0 or 1. CodE:
clf = RandomForestClassifier(random_state =42)
clf.fit(X_train, y_train[target].values)
pred_test = clf.predict(X_test)
print(roc_auc_score(y_test, pred_test))
clf_disp = plot_roc_curve(clf, X_test, y_test)
plt.show()
Output of the code (the roc_auc_Score is just above the graph).
回答1:
The ROC Curve and the roc_auc_score take the prediction probabilities as input, but as I can see from your code you are providing the prediction labels. You need to fix that.
回答2:
You are feeding the prediction classes instead of prediction probabilities to
roc_auc_score
.
From Documentation:
y_score: array-like of shape (n_samples,) or (n_samples, n_classes)
Target scores. In the binary and multilabel cases, these can be either probability estimates or non-thresholded decision values (as returned by decision_function on some classifiers).
change your code to:
clf = RandomForestClassifier(random_state =42)
clf.fit(X_train, y_train[target].values)
y_score = clf.predict_prob(X_test)
print(roc_auc_score(y_test, y_score))
来源:https://stackoverflow.com/questions/60615281/different-result-roc-auc-score-and-plot-roc-curve