scikit learn output metrics.classification_report into CSV/tab-delimited format

后端 未结 17 2180
青春惊慌失措
青春惊慌失措 2021-01-31 03:08

I\'m doing a multiclass text classification in Scikit-Learn. The dataset is being trained using the Multinomial Naive Bayes classifier having hundreds of labels. Here\'s an extr

相关标签:
17条回答
  • 2021-01-31 04:07

    Along with example input-output, here's the other function metrics_report_to_df(). Implementing precision_recall_fscore_support from Sklearn metrics should do:

    # Generates classification metrics using precision_recall_fscore_support:
    from sklearn import metrics
    import pandas as pd
    import numpy as np; from numpy import random
    
    # Simulating true and predicted labels as test dataset: 
    np.random.seed(10)
    y_true = np.array([0]*300 + [1]*700)
    y_pred = np.random.randint(2, size=1000)
    
    # Here's the custom function returning classification report dataframe:
    def metrics_report_to_df(ytrue, ypred):
        precision, recall, fscore, support = metrics.precision_recall_fscore_support(ytrue, ypred)
        classification_report = pd.concat(map(pd.DataFrame, [precision, recall, fscore, support]), axis=1)
        classification_report.columns = ["precision", "recall", "f1-score", "support"] # Add row w "avg/total"
        classification_report.loc['avg/Total', :] = metrics.precision_recall_fscore_support(ytrue, ypred, average='weighted')
        classification_report.loc['avg/Total', 'support'] = classification_report['support'].sum() 
        return(classification_report)
    
    # Provide input as true_label and predicted label (from classifier)
    classification_report = metrics_report_to_df(y_true, y_pred)
    
    # Here's the output (metrics report transformed to dataframe )
    In [1047]: classification_report
    Out[1047]: 
               precision    recall  f1-score  support
    0           0.300578  0.520000  0.380952    300.0
    1           0.700624  0.481429  0.570703    700.0
    avg/Total   0.580610  0.493000  0.513778   1000.0
    
    0 讨论(0)
  • 2021-01-31 04:10

    I have modified @kindjacket's answer. Try this:

    import collections
    def classification_report_df(report):
        report_data = []
        lines = report.split('\n')
        del lines[-5]
        del lines[-1]
        del lines[1]
        for line in lines[1:]:
            row = collections.OrderedDict()
            row_data = line.split()
            row_data = list(filter(None, row_data))
            row['class'] = row_data[0] + " " + row_data[1]
            row['precision'] = float(row_data[2])
            row['recall'] = float(row_data[3])
            row['f1_score'] = float(row_data[4])
            row['support'] = int(row_data[5])
            report_data.append(row)
        df = pd.DataFrame.from_dict(report_data)
        df.set_index('class', inplace=True)
        return df
    

    You can just export that df to csv using pandas

    0 讨论(0)
  • 2021-01-31 04:11

    While the previous answers are probably all working I found them a bit verbose. The following stores the individual class results as well as the summary line in a single dataframe. Not very sensitive to changes in the report but did the trick for me.

    #init snippet and fake data
    from io import StringIO
    import re
    import pandas as pd
    from sklearn import metrics
    true_label = [1,1,2,2,3,3]
    pred_label = [1,2,2,3,3,1]
    
    def report_to_df(report):
        report = re.sub(r" +", " ", report).replace("avg / total", "avg/total").replace("\n ", "\n")
        report_df = pd.read_csv(StringIO("Classes" + report), sep=' ', index_col=0)        
        return(report_df)
    
    #txt report to df
    report = metrics.classification_report(true_label, pred_label)
    report_df = report_to_df(report)
    
    #store, print, copy...
    print (report_df)
    

    Which gives the desired output:

    Classes precision   recall  f1-score    support
    1   0.5 0.5 0.5 2
    2   0.5 0.5 0.5 2
    3   0.5 0.5 0.5 2
    avg/total   0.5 0.5 0.5 6
    
    0 讨论(0)
  • 2021-01-31 04:13

    As of scikit-learn v0.20, the easiest way to convert a classification report to a pandas Dataframe is by simply having the report returned as a dict:

    report = classification_report(y_test, y_pred, output_dict=True)
    

    and then construct a Dataframe and transpose it:

    df = pandas.DataFrame(report).transpose()
    

    From here on, you are free to use the standard pandas methods to generate your desired output formats (CSV, HTML, LaTeX, ...).

    See also the documentation at https://scikit-learn.org/0.20/modules/generated/sklearn.metrics.classification_report.html

    0 讨论(0)
  • 2021-01-31 04:14

    I had the same problem what i did was, paste the string output of metrics.classification_report into google sheets or excel and split the text into columns by custom 5 whitespaces.

    0 讨论(0)
提交回复
热议问题