using confusion matrix as scoring metric in cross validation in scikit learn

前端 未结 5 614
野性不改
野性不改 2021-01-31 11:15

I am creating a pipeline in scikit learn,

pipeline = Pipeline([
    (\'bow\', CountVectorizer()),  
    (\'classifier\', BernoulliNB()), 
])

a

5条回答
  •  遇见更好的自我
    2021-01-31 12:05

    I think what you really want is average of confusion matrices obtained from each cross-validation run. @lejlot already nicely explained why, I'll just upgrade his answer with calculation of mean of confusion matrices:

    Calculate confusion matrix in each run of cross validation. You can use something like this:

    conf_matrix_list_of_arrays = []
    kf = cross_validation.KFold(len(y), n_folds=5)
    for train_index, test_index in kf:
    
       X_train, X_test = X[train_index], X[test_index]
       y_train, y_test = y[train_index], y[test_index]
    
       model.fit(X_train, y_train)
       conf_matrix = confusion_matrix(y_test, model.predict(X_test))
       conf_matrix_list_of_arrays .append(conf_matrix)
    

    On the end you can calculate your mean of list of numpy arrays (confusion matrices) with:

    mean_of_conf_matrix_arrays = np.mean(conf_matrix_list_of_arrays, axis=0)
    

提交回复
热议问题