scikit learn output metrics.classification_report into CSV/tab-delimited format

后端未结

关注

 17  2204

I\'m doing a multiclass text classification in Scikit-Learn. The dataset is being trained using the Multinomial Naive Bayes classifier having hundreds of labels. Here\'s an extr

相关标签:

17条回答

不思量自难忘°

2021-01-31 04:07

Along with example input-output, here's the other function metrics_report_to_df(). Implementing precision_recall_fscore_support from Sklearn metrics should do:

# Generates classification metrics using precision_recall_fscore_support:
from sklearn import metrics
import pandas as pd
import numpy as np; from numpy import random

# Simulating true and predicted labels as test dataset: 
np.random.seed(10)
y_true = np.array([0]*300 + [1]*700)
y_pred = np.random.randint(2, size=1000)

# Here's the custom function returning classification report dataframe:
def metrics_report_to_df(ytrue, ypred):
    precision, recall, fscore, support = metrics.precision_recall_fscore_support(ytrue, ypred)
    classification_report = pd.concat(map(pd.DataFrame, [precision, recall, fscore, support]), axis=1)
    classification_report.columns = ["precision", "recall", "f1-score", "support"] # Add row w "avg/total"
    classification_report.loc['avg/Total', :] = metrics.precision_recall_fscore_support(ytrue, ypred, average='weighted')
    classification_report.loc['avg/Total', 'support'] = classification_report['support'].sum() 
    return(classification_report)

# Provide input as true_label and predicted label (from classifier)
classification_report = metrics_report_to_df(y_true, y_pred)

# Here's the output (metrics report transformed to dataframe )
In [1047]: classification_report
Out[1047]: 
           precision    recall  f1-score  support
0           0.300578  0.520000  0.380952    300.0
1           0.700624  0.481429  0.570703    700.0
avg/Total   0.580610  0.493000  0.513778   1000.0

0 讨论(0)

温柔的废话

2021-01-31 04:10

I have modified @kindjacket's answer. Try this:

import collections
def classification_report_df(report):
    report_data = []
    lines = report.split('\n')
    del lines[-5]
    del lines[-1]
    del lines[1]
    for line in lines[1:]:
        row = collections.OrderedDict()
        row_data = line.split()
        row_data = list(filter(None, row_data))
        row['class'] = row_data[0] + " " + row_data[1]
        row['precision'] = float(row_data[2])
        row['recall'] = float(row_data[3])
        row['f1_score'] = float(row_data[4])
        row['support'] = int(row_data[5])
        report_data.append(row)
    df = pd.DataFrame.from_dict(report_data)
    df.set_index('class', inplace=True)
    return df

You can just export that df to csv using pandas

0 讨论(0)

谎友^

2021-01-31 04:11

While the previous answers are probably all working I found them a bit verbose. The following stores the individual class results as well as the summary line in a single dataframe. Not very sensitive to changes in the report but did the trick for me.

#init snippet and fake data
from io import StringIO
import re
import pandas as pd
from sklearn import metrics
true_label = [1,1,2,2,3,3]
pred_label = [1,2,2,3,3,1]

def report_to_df(report):
    report = re.sub(r" +", " ", report).replace("avg / total", "avg/total").replace("\n ", "\n")
    report_df = pd.read_csv(StringIO("Classes" + report), sep=' ', index_col=0)        
    return(report_df)

#txt report to df
report = metrics.classification_report(true_label, pred_label)
report_df = report_to_df(report)

#store, print, copy...
print (report_df)

Which gives the desired output:

Classes precision   recall  f1-score    support
1   0.5 0.5 0.5 2
2   0.5 0.5 0.5 2
3   0.5 0.5 0.5 2
avg/total   0.5 0.5 0.5 6

0 讨论(0)

鱼传尺愫

2021-01-31 04:13
As of scikit-learn v0.20, the easiest way to convert a classification report to a pandas Dataframe is by simply having the report returned as a dict:
```
report = classification_report(y_test, y_pred, output_dict=True)
```
and then construct a Dataframe and transpose it:
```
df = pandas.DataFrame(report).transpose()
```
From here on, you are free to use the standard pandas methods to generate your desired output formats (CSV, HTML, LaTeX, ...).

See also the documentation at https://scikit-learn.org/0.20/modules/generated/sklearn.metrics.classification_report.html
0 讨论(0)
发布评论:

提交评论
- 加载中...
南旧

2021-01-31 04:14

I had the same problem what i did was, paste the string output of metrics.classification_report into google sheets or excel and split the text into columns by custom 5 whitespaces.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2 3