Sklearn confusion matrix estimation by cross validation

旧时模样 提交于 2019-12-12 03:32:16

问题


I am trying to estimate the confusion matrix of a classifier using 10-fold cross-validation with sklearn.

To compute the confusion matrix I am using sklearn.metrics.confusion_matrix. I know that I can evaluate a model with cv using sklearn.model_selection.cross_val_score and sklearn.metrics.make_scorer like:

from sklearn.metrics import confusion_matrix, make_scorer
from sklearn.model_selection import cross_val_score
cm = cross_val_score(clf, X, y, make_scorer(confusion_matrix))

Where clf is my classifier and X, y the feature and class vectors. However, this will raise an error since confusion_matrix does not return a float number but a matrix.

I've tried doing something like:

import numpy as np
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import StratifiedKFold


def cv_confusion_matrix(clf, X, y, folds=10):
    skf = StratifiedKFold(n_splits=folds)
    cv_iter = skf.split(X, y)
    cms = []

    for train, test in cv_iter:
        clf.fit(X[train,], y[train])
        cm = confusion_matrix(y[test], clf.predict(X[test]), labels=clf.classes_)
        cms.append(cm)
    return np.mean(np.array(cms), axis=1)

This will work, but I missing the parallelism that sklearn has with cross_val_score and the n_jobs parameter.

Is there any way to do this and to take the advantage of the parallelism?

来源:https://stackoverflow.com/questions/42669055/sklearn-confusion-matrix-estimation-by-cross-validation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!