I\'m doing a multiclass text classification in Scikit-Learn. The dataset is being trained using the Multinomial Naive Bayes classifier having hundreds of labels. Here\'s an extr
As mentioned in one of the posts in here, precision_recall_fscore_support
is analogous to classification_report
.
Then it suffices to use python library pandas
to easily format the data in a columnar format, similar to what classification_report
does. Here is an example:
import numpy as np
import pandas as pd
from sklearn.metrics import classification_report
from sklearn.metrics import precision_recall_fscore_support
np.random.seed(0)
y_true = np.array([0]*400 + [1]*600)
y_pred = np.random.randint(2, size=1000)
def pandas_classification_report(y_true, y_pred):
metrics_summary = precision_recall_fscore_support(
y_true=y_true,
y_pred=y_pred)
avg = list(precision_recall_fscore_support(
y_true=y_true,
y_pred=y_pred,
average='weighted'))
metrics_sum_index = ['precision', 'recall', 'f1-score', 'support']
class_report_df = pd.DataFrame(
list(metrics_summary),
index=metrics_sum_index)
support = class_report_df.loc['support']
total = support.sum()
avg[-1] = total
class_report_df['avg / total'] = avg
return class_report_df.T
With classification_report
You'll get something like:
print(classification_report(y_true=y_true, y_pred=y_pred, digits=6))
Output:
precision recall f1-score support
0 0.379032 0.470000 0.419643 400
1 0.579365 0.486667 0.528986 600
avg / total 0.499232 0.480000 0.485248 1000
Then with our custom funtion pandas_classification_report
:
df_class_report = pandas_classification_report(y_true=y_true, y_pred=y_pred)
print(df_class_report)
Output:
precision recall f1-score support
0 0.379032 0.470000 0.419643 400.0
1 0.579365 0.486667 0.528986 600.0
avg / total 0.499232 0.480000 0.485248 1000.0
Then just save it to csv format (refer to here for other separator formating like sep=';'):
df_class_report.to_csv('my_csv_file.csv', sep=',')
I open my_csv_file.csv
with LibreOffice Calc (although you could use any tabular/spreadsheet editor like excel):