scikit learn output metrics.classification_report into CSV/tab-delimited format

后端未结

关注

 17  2202

I\'m doing a multiclass text classification in Scikit-Learn. The dataset is being trained using the Multinomial Naive Bayes classifier having hundreds of labels. Here\'s an extr

相关标签:

17条回答

情歌与酒

2021-01-31 03:48
The way I have always solved output problems is like what I've mentioned in my previous comment, I've converted my output to a DataFrame. Not only is it incredibly easy to send to files (see here), but Pandas is really easy to manipulate the data structure. The other way I have solved this is writing the output line-by-line using CSV and specifically using writerow.

If you manage to get the output into a dataframe it would be
```
dataframe_name_here.to_csv()
```
or if using CSV it would be something like the example they provide in the CSV link.
0 讨论(0)
发布评论:

提交评论
- 加载中...
灰色年华

2021-01-31 03:50
Just import pandas as pd and make sure that you set the output_dict parameter which by default is False to True when computing the classification_report. This will result in an classification_report dictionary which you can then pass to a pandas DataFrame method. You may want to transpose the resulting DataFrame to fit the fit the output format that you want. The resulting DataFrame may then be written to a csv file as you wish.
```
clsf_report = pd.DataFrame(classification_report(y_true = your_y_true, y_pred = your_y_preds5, output_dict=True)).transpose()
clsf_report.to_csv('Your Classification Report Name.csv', index= True)
```
I hope this helps.
0 讨论(0)
发布评论:

提交评论
- 加载中...

说谎

2021-01-31 03:50

As mentioned in one of the posts in here, precision_recall_fscore_support is analogous to classification_report.

Then it suffices to use python library pandas to easily format the data in a columnar format, similar to what classification_report does. Here is an example:

import numpy as np
import pandas as pd

from sklearn.metrics import classification_report
from  sklearn.metrics import precision_recall_fscore_support

np.random.seed(0)

y_true = np.array([0]*400 + [1]*600)
y_pred = np.random.randint(2, size=1000)

def pandas_classification_report(y_true, y_pred):
    metrics_summary = precision_recall_fscore_support(
            y_true=y_true, 
            y_pred=y_pred)

    avg = list(precision_recall_fscore_support(
            y_true=y_true, 
            y_pred=y_pred,
            average='weighted'))

    metrics_sum_index = ['precision', 'recall', 'f1-score', 'support']
    class_report_df = pd.DataFrame(
        list(metrics_summary),
        index=metrics_sum_index)

    support = class_report_df.loc['support']
    total = support.sum() 
    avg[-1] = total

    class_report_df['avg / total'] = avg

    return class_report_df.T

With classification_report You'll get something like:

print(classification_report(y_true=y_true, y_pred=y_pred, digits=6))

Output:

             precision    recall  f1-score   support

          0   0.379032  0.470000  0.419643       400
          1   0.579365  0.486667  0.528986       600

avg / total   0.499232  0.480000  0.485248      1000

Then with our custom funtion pandas_classification_report:

df_class_report = pandas_classification_report(y_true=y_true, y_pred=y_pred)
print(df_class_report)

Output:

             precision    recall  f1-score  support
0             0.379032  0.470000  0.419643    400.0
1             0.579365  0.486667  0.528986    600.0
avg / total   0.499232  0.480000  0.485248   1000.0

Then just save it to csv format (refer to here for other separator formating like sep=';'):

df_class_report.to_csv('my_csv_file.csv',  sep=',')

I open my_csv_file.csv with LibreOffice Calc (although you could use any tabular/spreadsheet editor like excel):

0 讨论(0)

说谎

2021-01-31 03:56

It's obviously a better idea to just output the classification report as dict:

sklearn.metrics.classification_report(y_true, y_pred, output_dict=True)

But here's a function I made to convert all classes (only classes) results to a pandas dataframe.

def report_to_df(report):
    report = [x.split(' ') for x in report.split('\n')]
    header = ['Class Name']+[x for x in report[0] if x!='']
    values = []
    for row in report[1:-5]:
        row = [value for value in row if value!='']
        if row!=[]:
            values.append(row)
    df = pd.DataFrame(data = values, columns = header)
    return df

Hope this works fine for you.

0 讨论(0)

耶瑟儿～

2021-01-31 03:58
Another option is to calculate the underlying data and compose the report on your own. All the statistics you will get by
```
precision_recall_fscore_support
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

北荒

2021-01-31 03:59

This is my code for 2 classes(pos,neg) classification

report = metrics.precision_recall_fscore_support(true_labels,predicted_labels,labels=classes)
        rowDicionary["precision_pos"] = report[0][0]
        rowDicionary["recall_pos"] = report[1][0]
        rowDicionary["f1-score_pos"] = report[2][0]
        rowDicionary["support_pos"] = report[3][0]
        rowDicionary["precision_neg"] = report[0][1]
        rowDicionary["recall_neg"] = report[1][1]
        rowDicionary["f1-score_neg"] = report[2][1]
        rowDicionary["support_neg"] = report[3][1]
        writer = csv.DictWriter(file, fieldnames=fieldnames)
        writer.writerow(rowDicionary)

0 讨论(0)

1 2 3 下一页