scikit learn output metrics.classification_report into CSV/tab-delimited format

后端 未结 17 2212
青春惊慌失措
青春惊慌失措 2021-01-31 03:08

I\'m doing a multiclass text classification in Scikit-Learn. The dataset is being trained using the Multinomial Naive Bayes classifier having hundreds of labels. Here\'s an extr

17条回答
  •  说谎
    说谎 (楼主)
    2021-01-31 03:50

    As mentioned in one of the posts in here, precision_recall_fscore_support is analogous to classification_report.

    Then it suffices to use python library pandas to easily format the data in a columnar format, similar to what classification_report does. Here is an example:

    import numpy as np
    import pandas as pd
    
    from sklearn.metrics import classification_report
    from  sklearn.metrics import precision_recall_fscore_support
    
    np.random.seed(0)
    
    y_true = np.array([0]*400 + [1]*600)
    y_pred = np.random.randint(2, size=1000)
    
    def pandas_classification_report(y_true, y_pred):
        metrics_summary = precision_recall_fscore_support(
                y_true=y_true, 
                y_pred=y_pred)
    
        avg = list(precision_recall_fscore_support(
                y_true=y_true, 
                y_pred=y_pred,
                average='weighted'))
    
        metrics_sum_index = ['precision', 'recall', 'f1-score', 'support']
        class_report_df = pd.DataFrame(
            list(metrics_summary),
            index=metrics_sum_index)
    
        support = class_report_df.loc['support']
        total = support.sum() 
        avg[-1] = total
    
        class_report_df['avg / total'] = avg
    
        return class_report_df.T
    

    With classification_report You'll get something like:

    print(classification_report(y_true=y_true, y_pred=y_pred, digits=6))
    

    Output:

                 precision    recall  f1-score   support
    
              0   0.379032  0.470000  0.419643       400
              1   0.579365  0.486667  0.528986       600
    
    avg / total   0.499232  0.480000  0.485248      1000
    

    Then with our custom funtion pandas_classification_report:

    df_class_report = pandas_classification_report(y_true=y_true, y_pred=y_pred)
    print(df_class_report)
    

    Output:

                 precision    recall  f1-score  support
    0             0.379032  0.470000  0.419643    400.0
    1             0.579365  0.486667  0.528986    600.0
    avg / total   0.499232  0.480000  0.485248   1000.0
    

    Then just save it to csv format (refer to here for other separator formating like sep=';'):

    df_class_report.to_csv('my_csv_file.csv',  sep=',')
    

    I open my_csv_file.csv with LibreOffice Calc (although you could use any tabular/spreadsheet editor like excel):

提交回复
热议问题