using confusion matrix as scoring metric in cross validation in scikit learn

前端 未结 5 617
野性不改
野性不改 2021-01-31 11:15

I am creating a pipeline in scikit learn,

pipeline = Pipeline([
    (\'bow\', CountVectorizer()),  
    (\'classifier\', BernoulliNB()), 
])

a

5条回答
  •  梦毁少年i
    2021-01-31 11:40

    Short answer is "you cannot".

    You need to understand difference between cross_val_score and cross validation as model selection method. cross_val_score as name suggests, works only on scores. Confusion matrix is not a score, it is a kind of summary of what happened during evaluation. A major distinction is that a score is supposed to return an orderable object, in particular in scikit-learn - a float. So, based on score you can tell whether method b is better from a by simply comparing if b has bigger score. You cannot do this with confusion matrix which, again as name suggests, is a matrix.

    If you want to obtain confusion matrices for multiple evaluation runs (such as cross validation) you have to do this by hand, which is not that bad in scikit-learn - it is actually a few lines of code.

    kf = cross_validation.KFold(len(y), n_folds=5)
    for train_index, test_index in kf:
    
       X_train, X_test = X[train_index], X[test_index]
       y_train, y_test = y[train_index], y[test_index]
    
       model.fit(X_train, y_train)
       print confusion_matrix(y_test, model.predict(X_test))
    

提交回复
热议问题