I\'m doing a multiclass text classification in Scikit-Learn. The dataset is being trained using the Multinomial Naive Bayes classifier having hundreds of labels. Here\'s an extr
The way I have always solved output problems is like what I've mentioned in my previous comment, I've converted my output to a DataFrame. Not only is it incredibly easy to send to files (see here), but Pandas is really easy to manipulate the data structure. The other way I have solved this is writing the output line-by-line using CSV and specifically using writerow
.
If you manage to get the output into a dataframe it would be
dataframe_name_here.to_csv()
or if using CSV it would be something like the example they provide in the CSV link.
Just import pandas as pd
and make sure that you set the output_dict
parameter which by default is False
to True
when computing the classification_report
. This will result in an classification_report dictionary
which you can then pass to a pandas DataFrame
method. You may want to transpose
the resulting DataFrame
to fit the fit the output format that you want. The resulting DataFrame
may then be written to a csv
file as you wish.
clsf_report = pd.DataFrame(classification_report(y_true = your_y_true, y_pred = your_y_preds5, output_dict=True)).transpose()
clsf_report.to_csv('Your Classification Report Name.csv', index= True)
I hope this helps.
As mentioned in one of the posts in here, precision_recall_fscore_support
is analogous to classification_report
.
Then it suffices to use python library pandas
to easily format the data in a columnar format, similar to what classification_report
does. Here is an example:
import numpy as np
import pandas as pd
from sklearn.metrics import classification_report
from sklearn.metrics import precision_recall_fscore_support
np.random.seed(0)
y_true = np.array([0]*400 + [1]*600)
y_pred = np.random.randint(2, size=1000)
def pandas_classification_report(y_true, y_pred):
metrics_summary = precision_recall_fscore_support(
y_true=y_true,
y_pred=y_pred)
avg = list(precision_recall_fscore_support(
y_true=y_true,
y_pred=y_pred,
average='weighted'))
metrics_sum_index = ['precision', 'recall', 'f1-score', 'support']
class_report_df = pd.DataFrame(
list(metrics_summary),
index=metrics_sum_index)
support = class_report_df.loc['support']
total = support.sum()
avg[-1] = total
class_report_df['avg / total'] = avg
return class_report_df.T
With classification_report
You'll get something like:
print(classification_report(y_true=y_true, y_pred=y_pred, digits=6))
Output:
precision recall f1-score support
0 0.379032 0.470000 0.419643 400
1 0.579365 0.486667 0.528986 600
avg / total 0.499232 0.480000 0.485248 1000
Then with our custom funtion pandas_classification_report
:
df_class_report = pandas_classification_report(y_true=y_true, y_pred=y_pred)
print(df_class_report)
Output:
precision recall f1-score support
0 0.379032 0.470000 0.419643 400.0
1 0.579365 0.486667 0.528986 600.0
avg / total 0.499232 0.480000 0.485248 1000.0
Then just save it to csv format (refer to here for other separator formating like sep=';'):
df_class_report.to_csv('my_csv_file.csv', sep=',')
I open my_csv_file.csv
with LibreOffice Calc (although you could use any tabular/spreadsheet editor like excel):
It's obviously a better idea to just output the classification report as dict:
sklearn.metrics.classification_report(y_true, y_pred, output_dict=True)
But here's a function I made to convert all classes (only classes) results to a pandas dataframe.
def report_to_df(report):
report = [x.split(' ') for x in report.split('\n')]
header = ['Class Name']+[x for x in report[0] if x!='']
values = []
for row in report[1:-5]:
row = [value for value in row if value!='']
if row!=[]:
values.append(row)
df = pd.DataFrame(data = values, columns = header)
return df
Hope this works fine for you.
Another option is to calculate the underlying data and compose the report on your own. All the statistics you will get by
precision_recall_fscore_support
This is my code for 2 classes(pos,neg) classification
report = metrics.precision_recall_fscore_support(true_labels,predicted_labels,labels=classes)
rowDicionary["precision_pos"] = report[0][0]
rowDicionary["recall_pos"] = report[1][0]
rowDicionary["f1-score_pos"] = report[2][0]
rowDicionary["support_pos"] = report[3][0]
rowDicionary["precision_neg"] = report[0][1]
rowDicionary["recall_neg"] = report[1][1]
rowDicionary["f1-score_neg"] = report[2][1]
rowDicionary["support_neg"] = report[3][1]
writer = csv.DictWriter(file, fieldnames=fieldnames)
writer.writerow(rowDicionary)