Sci-kit learn how to print labels for confusion matrix?

前端 未结 5 1617
孤城傲影
孤城傲影 2021-02-13 16:52

So I\'m using sci-kit learn to classify some data. I have 13 different class values/categorizes to classify the data to. Now I have been able to use cross validation and print t

相关标签:
5条回答
  • 2021-02-13 17:04

    Since confusion matrix is just a numpy matrix, it does not contain any column information. What you can do is convert your matrix into a dataframe and then print this dataframe.

    import pandas as pd
    import numpy as np
    
    def cm2df(cm, labels):
        df = pd.DataFrame()
        # rows
        for i, row_label in enumerate(labels):
            rowdata={}
            # columns
            for j, col_label in enumerate(labels): 
                rowdata[col_label]=cm[i,j]
            df = df.append(pd.DataFrame.from_dict({row_label:rowdata}, orient='index'))
        return df[labels]
    
    cm = np.arange(9).reshape((3, 3))
    df = cm2df(cm, ["a", "b", "c"])
    print(df)
    

    Code snippet is from https://gist.github.com/nickynicolson/202fe765c99af49acb20ea9f77b6255e

    Output:

       a  b  c
    a  0  1  2
    b  3  4  5
    c  6  7  8
    
    0 讨论(0)
  • 2021-02-13 17:12

    From the doc, it seems that there is no such option to print the rows and column labels of the confusion matrix. However, you can specify the label order using argument labels=...

    Example:

    from sklearn.metrics import confusion_matrix
    
    y_true = ['yes','yes','yes','no','no','no']
    y_pred = ['yes','no','no','no','no','no']
    print(confusion_matrix(y_true, y_pred))
    # Output:
    # [[3 0]
    #  [2 1]]
    print(confusion_matrix(y_true, y_pred, labels=['yes', 'no']))
    # Output:
    # [[1 2]
    #  [0 3]]
    

    If you want to print the confusion matrix with labels, you may try pandas and set the index and columns of the DataFrame.

    import pandas as pd
    cmtx = pd.DataFrame(
        confusion_matrix(y_true, y_pred, labels=['yes', 'no']), 
        index=['true:yes', 'true:no'], 
        columns=['pred:yes', 'pred:no']
    )
    print(cmtx)
    # Output:
    #           pred:yes  pred:no
    # true:yes         1        2
    # true:no          0        3
    

    Or

    unique_label = np.unique([y_true, y_pred])
    cmtx = pd.DataFrame(
        confusion_matrix(y_true, y_pred, labels=unique_label), 
        index=['true:{:}'.format(x) for x in unique_label], 
        columns=['pred:{:}'.format(x) for x in unique_label]
    )
    print(cmtx)
    # Output:
    #           pred:no  pred:yes
    # true:no         3         0
    # true:yes        2         1
    
    0 讨论(0)
  • 2021-02-13 17:18

    Another better way of doing this is using crosstab function in pandas.

    pd.crosstab(y_true, y_pred, rownames=['True'], colnames=['Predicted'], margins=True).

    or

    pd.crosstab(le.inverse_transform(y_true), le.inverse_transform(y_pred),rownames=['True'], colnames=['Predicted'], margins=True)

    0 讨论(0)
  • 2021-02-13 17:29

    It appears your data has 13 different classes, which is why your confusion matrix has 13 rows and columns. Furthermore, your classes aren't labeled in any way, just integers from what I can see.

    If this isn't the case, and your training data has actual labels, you can pass a list of unique labels to confusion_matrix

    conf_mat = confusion_matrix(class_label, class_label_predicted, df['task'].unique())
    
    0 讨论(0)
  • 2021-02-13 17:30

    It is important to ensure that the way you label your confusion matrix rows and columns corresponds exactly to the way sklearn has coded the classes. The true order of the labels can be revealed using the .classes_ attribute of the classifier. You can use the code below to prepare a confusion matrix data frame.

    labels = rfc.classes_
    conf_df = pd.DataFrame(confusion_matrix(class_label, class_label_predicted, columns=labels, index=labels))
    conf_df.index.name = 'True labels'
    

    The second thing to note is that your classifier is not predicting labels well. The number of correctly predicted labels is shown on the main diagonal of the confusion matrix. You have non-zero values accross the matrix and some classes have not been predicted at all - the columns that are all zero. It might be a good idea to run the classifier with its default parameters and then try to optimise them.

    0 讨论(0)
提交回复
热议问题