I am creating a pipeline in scikit learn,
pipeline = Pipeline([
(\'bow\', CountVectorizer()),
(\'classifier\', BernoulliNB()),
])
a
I think what you really want is average of confusion matrices obtained from each cross-validation run. @lejlot already nicely explained why, I'll just upgrade his answer with calculation of mean of confusion matrices:
Calculate confusion matrix in each run of cross validation. You can use something like this:
conf_matrix_list_of_arrays = []
kf = cross_validation.KFold(len(y), n_folds=5)
for train_index, test_index in kf:
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
model.fit(X_train, y_train)
conf_matrix = confusion_matrix(y_test, model.predict(X_test))
conf_matrix_list_of_arrays .append(conf_matrix)
On the end you can calculate your mean of list of numpy arrays (confusion matrices) with:
mean_of_conf_matrix_arrays = np.mean(conf_matrix_list_of_arrays, axis=0)