confusion matrix and classification report of StratifiedKFold

问题

I am using StratifiedKFold to checking the performance of my classifier. I have two classes and I trying to build Logistic Regression classier. Here is my code

skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=0)
for train_index, test_index in skf.split(x, y):
    x_train, x_test = x[train_index], x[test_index]
    y_train, y_test = y[train_index], y[test_index]

    tfidf = TfidfVectorizer()
    x_train = tfidf.fit_transform(x_train)
    x_test = tfidf.transform(x_test)

    clf =  LogisticRegression(class_weight='balanced')
    clf.fit(x_train, y_train)
    y_pred = clf.predict(x_test)
    score = accuracy_score(y_test, y_pred)
    r.append(score)
    print(score)

print(np.mean(r))

I could just print the score of the performance but I couldn't figure out how to print the confusion matrix and classification report.If I just add print statement inside the loop,

print(confusion_matrix(y_test, y_pred))

it will print it 10 times, but I want to report and a matrix of the final performance of the classifier.

Any help about how to calculation the matrix and the report. Thanks

回答1:

Cross validation is used to asses the performance of particular models or hyperparameters across different splits of a dataset. At the end you don't have a final performance per se, you have the individual performance of each split and the aggregated performance across splits. You could potentially use the tn, fn, fp, tp for each to create an aggregated precision, recall, sensitivity, etc... but then you could also just use the predefined functions for those metrics in sklearn and aggregate them at the end.

e.g.

skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=0)
accs, precs, recs = [], [], []
for train_index, test_index in skf.split(x, y):
    x_train, x_test = x[train_index], x[test_index]
    y_train, y_test = y[train_index], y[test_index]

    tfidf = TfidfVectorizer()
    x_train = tfidf.fit_transform(x_train)
    x_test = tfidf.transform(x_test)

    clf =  LogisticRegression(class_weight='balanced')
    clf.fit(x_train, y_train)
    y_pred = clf.predict(x_test)
    acc = accuracy_score(y_test, y_pred)
    prec = precision_score(y_test, y_pred)
    rec = recall_score(y_test, y_pred)
    accs.append(acc)
    precs.append(prec)
    recs.append(rec)
    print(f'Accuracy: {acc}, Precision: {prec}, Recall: {rec}')

print(f'Mean Accuracy: {np.mean(accs)}, Mean Precision: {np.mean(precs)}, Mean Recall: {np.mean(recs)}')

来源：https://stackoverflow.com/questions/55003149/confusion-matrix-and-classification-report-of-stratifiedkfold

标签

python

machine-learning

scikit-learn

confusion-matrix