发表新帖

发表新帖

using confusion matrix as scoring metric in cross validation in scikit learn

前端未结

关注

 5  617

野性不改 2021-01-31 11:15

I am creating a pipeline in scikit learn,

pipeline = Pipeline([
    (\'bow\', CountVectorizer()),  
    (\'classifier\', BernoulliNB()), 
])

a

5条回答

梦毁少年i (楼主)

2021-01-31 11:40
Short answer is "you cannot".

You need to understand difference between cross_val_score and cross validation as model selection method. cross_val_score as name suggests, works only on scores. Confusion matrix is not a score, it is a kind of summary of what happened during evaluation. A major distinction is that a score is supposed to return an orderable object, in particular in scikit-learn - a float. So, based on score you can tell whether method b is better from a by simply comparing if b has bigger score. You cannot do this with confusion matrix which, again as name suggests, is a matrix.

If you want to obtain confusion matrices for multiple evaluation runs (such as cross validation) you have to do this by hand, which is not that bad in scikit-learn - it is actually a few lines of code.
```
kf = cross_validation.KFold(len(y), n_folds=5)
for train_index, test_index in kf:

   X_train, X_test = X[train_index], X[test_index]
   y_train, y_test = y[train_index], y[test_index]

   model.fit(X_train, y_train)
   print confusion_matrix(y_test, model.predict(X_test))
```
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题