scikit learn output metrics.classification_report into CSV/tab-delimited format

后端未结

关注

 17  2212

青春惊慌失措 2021-01-31 03:08

I\'m doing a multiclass text classification in Scikit-Learn. The dataset is being trained using the Multinomial Naive Bayes classifier having hundreds of labels. Here\'s an extr

17条回答

说谎 (楼主)

2021-01-31 03:50

As mentioned in one of the posts in here, precision_recall_fscore_support is analogous to classification_report.

Then it suffices to use python library pandas to easily format the data in a columnar format, similar to what classification_report does. Here is an example:

import numpy as np
import pandas as pd

from sklearn.metrics import classification_report
from  sklearn.metrics import precision_recall_fscore_support

np.random.seed(0)

y_true = np.array([0]*400 + [1]*600)
y_pred = np.random.randint(2, size=1000)

def pandas_classification_report(y_true, y_pred):
    metrics_summary = precision_recall_fscore_support(
            y_true=y_true, 
            y_pred=y_pred)

    avg = list(precision_recall_fscore_support(
            y_true=y_true, 
            y_pred=y_pred,
            average='weighted'))

    metrics_sum_index = ['precision', 'recall', 'f1-score', 'support']
    class_report_df = pd.DataFrame(
        list(metrics_summary),
        index=metrics_sum_index)

    support = class_report_df.loc['support']
    total = support.sum() 
    avg[-1] = total

    class_report_df['avg / total'] = avg

    return class_report_df.T

With classification_report You'll get something like:

print(classification_report(y_true=y_true, y_pred=y_pred, digits=6))

Output:

             precision    recall  f1-score   support

          0   0.379032  0.470000  0.419643       400
          1   0.579365  0.486667  0.528986       600

avg / total   0.499232  0.480000  0.485248      1000

Then with our custom funtion pandas_classification_report:

df_class_report = pandas_classification_report(y_true=y_true, y_pred=y_pred)
print(df_class_report)

Output:

             precision    recall  f1-score  support
0             0.379032  0.470000  0.419643    400.0
1             0.579365  0.486667  0.528986    600.0
avg / total   0.499232  0.480000  0.485248   1000.0

Then just save it to csv format (refer to here for other separator formating like sep=';'):

df_class_report.to_csv('my_csv_file.csv',  sep=',')

I open my_csv_file.csv with LibreOffice Calc (although you could use any tabular/spreadsheet editor like excel):

0 讨论(0)

查看其它17个回答