scikit learn output metrics.classification_report into CSV/tab-delimited format

后端 未结 17 2202
青春惊慌失措
青春惊慌失措 2021-01-31 03:08

I\'m doing a multiclass text classification in Scikit-Learn. The dataset is being trained using the Multinomial Naive Bayes classifier having hundreds of labels. Here\'s an extr

相关标签:
17条回答
  • 2021-01-31 03:48

    The way I have always solved output problems is like what I've mentioned in my previous comment, I've converted my output to a DataFrame. Not only is it incredibly easy to send to files (see here), but Pandas is really easy to manipulate the data structure. The other way I have solved this is writing the output line-by-line using CSV and specifically using writerow.

    If you manage to get the output into a dataframe it would be

    dataframe_name_here.to_csv()
    

    or if using CSV it would be something like the example they provide in the CSV link.

    0 讨论(0)
  • 2021-01-31 03:50

    Just import pandas as pd and make sure that you set the output_dict parameter which by default is False to True when computing the classification_report. This will result in an classification_report dictionary which you can then pass to a pandas DataFrame method. You may want to transpose the resulting DataFrame to fit the fit the output format that you want. The resulting DataFrame may then be written to a csv file as you wish.

    clsf_report = pd.DataFrame(classification_report(y_true = your_y_true, y_pred = your_y_preds5, output_dict=True)).transpose()
    clsf_report.to_csv('Your Classification Report Name.csv', index= True)
    

    I hope this helps.

    0 讨论(0)
  • 2021-01-31 03:50

    As mentioned in one of the posts in here, precision_recall_fscore_support is analogous to classification_report.

    Then it suffices to use python library pandas to easily format the data in a columnar format, similar to what classification_report does. Here is an example:

    import numpy as np
    import pandas as pd
    
    from sklearn.metrics import classification_report
    from  sklearn.metrics import precision_recall_fscore_support
    
    np.random.seed(0)
    
    y_true = np.array([0]*400 + [1]*600)
    y_pred = np.random.randint(2, size=1000)
    
    def pandas_classification_report(y_true, y_pred):
        metrics_summary = precision_recall_fscore_support(
                y_true=y_true, 
                y_pred=y_pred)
    
        avg = list(precision_recall_fscore_support(
                y_true=y_true, 
                y_pred=y_pred,
                average='weighted'))
    
        metrics_sum_index = ['precision', 'recall', 'f1-score', 'support']
        class_report_df = pd.DataFrame(
            list(metrics_summary),
            index=metrics_sum_index)
    
        support = class_report_df.loc['support']
        total = support.sum() 
        avg[-1] = total
    
        class_report_df['avg / total'] = avg
    
        return class_report_df.T
    

    With classification_report You'll get something like:

    print(classification_report(y_true=y_true, y_pred=y_pred, digits=6))
    

    Output:

                 precision    recall  f1-score   support
    
              0   0.379032  0.470000  0.419643       400
              1   0.579365  0.486667  0.528986       600
    
    avg / total   0.499232  0.480000  0.485248      1000
    

    Then with our custom funtion pandas_classification_report:

    df_class_report = pandas_classification_report(y_true=y_true, y_pred=y_pred)
    print(df_class_report)
    

    Output:

                 precision    recall  f1-score  support
    0             0.379032  0.470000  0.419643    400.0
    1             0.579365  0.486667  0.528986    600.0
    avg / total   0.499232  0.480000  0.485248   1000.0
    

    Then just save it to csv format (refer to here for other separator formating like sep=';'):

    df_class_report.to_csv('my_csv_file.csv',  sep=',')
    

    I open my_csv_file.csv with LibreOffice Calc (although you could use any tabular/spreadsheet editor like excel):

    0 讨论(0)
  • 2021-01-31 03:56

    It's obviously a better idea to just output the classification report as dict:

    sklearn.metrics.classification_report(y_true, y_pred, output_dict=True)
    

    But here's a function I made to convert all classes (only classes) results to a pandas dataframe.

    def report_to_df(report):
        report = [x.split(' ') for x in report.split('\n')]
        header = ['Class Name']+[x for x in report[0] if x!='']
        values = []
        for row in report[1:-5]:
            row = [value for value in row if value!='']
            if row!=[]:
                values.append(row)
        df = pd.DataFrame(data = values, columns = header)
        return df
    

    Hope this works fine for you.

    0 讨论(0)
  • 2021-01-31 03:58

    Another option is to calculate the underlying data and compose the report on your own. All the statistics you will get by

    precision_recall_fscore_support
    
    0 讨论(0)
  • 2021-01-31 03:59

    This is my code for 2 classes(pos,neg) classification

    report = metrics.precision_recall_fscore_support(true_labels,predicted_labels,labels=classes)
            rowDicionary["precision_pos"] = report[0][0]
            rowDicionary["recall_pos"] = report[1][0]
            rowDicionary["f1-score_pos"] = report[2][0]
            rowDicionary["support_pos"] = report[3][0]
            rowDicionary["precision_neg"] = report[0][1]
            rowDicionary["recall_neg"] = report[1][1]
            rowDicionary["f1-score_neg"] = report[2][1]
            rowDicionary["support_neg"] = report[3][1]
            writer = csv.DictWriter(file, fieldnames=fieldnames)
            writer.writerow(rowDicionary)
    
    0 讨论(0)
提交回复
热议问题