发表新帖

发表新帖

using confusion matrix as scoring metric in cross validation in scikit learn

前端未结

关注

 5  614

野性不改 2021-01-31 11:15

I am creating a pipeline in scikit learn,

pipeline = Pipeline([
    (\'bow\', CountVectorizer()),  
    (\'classifier\', BernoulliNB()), 
])

a

5条回答

遇见更好的自我 (楼主)

2021-01-31 12:05
I think what you really want is average of confusion matrices obtained from each cross-validation run. @lejlot already nicely explained why, I'll just upgrade his answer with calculation of mean of confusion matrices:

Calculate confusion matrix in each run of cross validation. You can use something like this:
```
conf_matrix_list_of_arrays = []
kf = cross_validation.KFold(len(y), n_folds=5)
for train_index, test_index in kf:

   X_train, X_test = X[train_index], X[test_index]
   y_train, y_test = y[train_index], y[test_index]

   model.fit(X_train, y_train)
   conf_matrix = confusion_matrix(y_test, model.predict(X_test))
   conf_matrix_list_of_arrays .append(conf_matrix)
```
On the end you can calculate your mean of list of numpy arrays (confusion matrices) with:
```
mean_of_conf_matrix_arrays = np.mean(conf_matrix_list_of_arrays, axis=0)
```
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题