发表新帖

发表新帖

using confusion matrix as scoring metric in cross validation in scikit learn

前端未结

关注

 5  616

野性不改 2021-01-31 11:15

I am creating a pipeline in scikit learn,

pipeline = Pipeline([
    (\'bow\', CountVectorizer()),  
    (\'classifier\', BernoulliNB()), 
])

a

5条回答

北荒 (楼主)

2021-01-31 11:45
What you can do though is to define a scorer that uses certain values from the confusion matrix. See here [link]. Just citing the code:
```
def tp(y_true, y_pred): return confusion_matrix(y_true, y_pred)[0, 0]
def tn(y_true, y_pred): return confusion_matrix(y_true, y_pred)[1, 1]
def fp(y_true, y_pred): return confusion_matrix(y_true, y_pred)[1, 0]
def fn(y_true, y_pred): return confusion_matrix(y_true, y_pred)[0, 1]
scoring = {'tp' : make_scorer(tp), 'tn' : make_scorer(tn),
           'fp' : make_scorer(fp), 'fn' : make_scorer(fn)}
cv_results = cross_validate(svm.fit(X, y), X, y, scoring=scoring)
```
This will perform the cross validation for each of these four scorers and return the scoring dictionary cv_results, e.g., with keys test_tp, test_tn, etc. containing the confusion matrices' values from each cross-validation split.

From this you could reconstruct an average confusion matrix, but the cross_val_predict of Xema seems more elegant for this.

Note that this will actually not work with cross_val_score; you'll need cross_validate (introduced in scikit-learn v0.19).

Side note: you could use one of these scorers (i.e. one element of the matrix) for hyper-parameter optimization via grid search.

*EDIT: true negatives are returned at [1, 1], not [0, 0]
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题