Sci-kit learn how to print labels for confusion matrix?

前端未结

关注

 5  1624

So I\'m using sci-kit learn to classify some data. I have 13 different class values/categorizes to classify the data to. Now I have been able to use cross validation and print t

相关标签:

5条回答

醉酒成梦

2021-02-13 17:04

Since confusion matrix is just a numpy matrix, it does not contain any column information. What you can do is convert your matrix into a dataframe and then print this dataframe.

import pandas as pd
import numpy as np

def cm2df(cm, labels):
    df = pd.DataFrame()
    # rows
    for i, row_label in enumerate(labels):
        rowdata={}
        # columns
        for j, col_label in enumerate(labels): 
            rowdata[col_label]=cm[i,j]
        df = df.append(pd.DataFrame.from_dict({row_label:rowdata}, orient='index'))
    return df[labels]

cm = np.arange(9).reshape((3, 3))
df = cm2df(cm, ["a", "b", "c"])
print(df)

Code snippet is from https://gist.github.com/nickynicolson/202fe765c99af49acb20ea9f77b6255e

Output:

0 讨论(0)

夕颜

2021-02-13 17:12

From the doc, it seems that there is no such option to print the rows and column labels of the confusion matrix. However, you can specify the label order using argument labels=...

Example:

from sklearn.metrics import confusion_matrix

y_true = ['yes','yes','yes','no','no','no']
y_pred = ['yes','no','no','no','no','no']
print(confusion_matrix(y_true, y_pred))
# Output:
# [[3 0]
#  [2 1]]
print(confusion_matrix(y_true, y_pred, labels=['yes', 'no']))
# Output:
# [[1 2]
#  [0 3]]

If you want to print the confusion matrix with labels, you may try pandas and set the index and columns of the DataFrame.

import pandas as pd
cmtx = pd.DataFrame(
    confusion_matrix(y_true, y_pred, labels=['yes', 'no']), 
    index=['true:yes', 'true:no'], 
    columns=['pred:yes', 'pred:no']
)
print(cmtx)
# Output:
#           pred:yes  pred:no
# true:yes         1        2
# true:no          0        3

unique_label = np.unique([y_true, y_pred])
cmtx = pd.DataFrame(
    confusion_matrix(y_true, y_pred, labels=unique_label), 
    index=['true:{:}'.format(x) for x in unique_label], 
    columns=['pred:{:}'.format(x) for x in unique_label]
)
print(cmtx)
# Output:
#           pred:no  pred:yes
# true:no         3         0
# true:yes        2         1

0 讨论(0)

小鲜肉

2021-02-13 17:18

Another better way of doing this is using crosstab function in pandas.

pd.crosstab(y_true, y_pred, rownames=['True'], colnames=['Predicted'], margins=True).

or

pd.crosstab(le.inverse_transform(y_true), le.inverse_transform(y_pred),rownames=['True'], colnames=['Predicted'], margins=True)

0 讨论(0)
发布评论:

提交评论
- 加载中...
[愿得一人]

2021-02-13 17:29
It appears your data has 13 different classes, which is why your confusion matrix has 13 rows and columns. Furthermore, your classes aren't labeled in any way, just integers from what I can see.

If this isn't the case, and your training data has actual labels, you can pass a list of unique labels to confusion_matrix
```
conf_mat = confusion_matrix(class_label, class_label_predicted, df['task'].unique())
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
一整个雨季

2021-02-13 17:30
It is important to ensure that the way you label your confusion matrix rows and columns corresponds exactly to the way sklearn has coded the classes. The true order of the labels can be revealed using the .classes_ attribute of the classifier. You can use the code below to prepare a confusion matrix data frame.
```
labels = rfc.classes_
conf_df = pd.DataFrame(confusion_matrix(class_label, class_label_predicted, columns=labels, index=labels))
conf_df.index.name = 'True labels'
```
The second thing to note is that your classifier is not predicting labels well. The number of correctly predicted labels is shown on the main diagonal of the confusion matrix. You have non-zero values accross the matrix and some classes have not been predicted at all - the columns that are all zero. It might be a good idea to run the classifier with its default parameters and then try to optimise them.
0 讨论(0)
发布评论:

提交评论
- 加载中...