How can I analyze a confusion matrix?

前端未结

关注

 3  511

When I print out scikit-learn\'s confusion matrix, I receive a very huge matrix. I want to analyze what are the true positives, true negatives etc. How can I do so? This is

相关标签:

3条回答

庸人自扰

2021-01-19 05:41

IIUC, your question is undefined. "False positives", "true negatives" - these are terms that are defined only for binary classification. Read more about the definition of a confusion matrix.

In this case, the confusion matrix is of dimension N X N. Each diagonal represents, for entry (i, i) the case where the prediction is i and the outcome is i too. Any other off-diagonal entry indicates some mistake where the prediction was i and the outcome is j. There is no meaning to "positive" and "negative in this case.

You can find the diagnoal elements easily using np.diagonal, and, following that, it is easy to sum them. The sum of wrong cases is the sum of the matrix minus the sum of the diagonal.

0 讨论(0)
发布评论:

提交评论
- 加载中...
情书的邮戳

2021-01-19 05:55
Approach 1: Binary Classification
```
from sklearn.metrics import confusion_matrix as cm
import pandas as pd

y_test = [1, 0, 0]
y_pred = [1, 0, 0]
confusion_matrix=cm(y_test, y_pred)

list1 = ["Actual 0", "Actual 1"]
list2 = ["Predicted 0", "Predicted 1"]
pd.DataFrame(confusion_matrix, list1,list2)
```
Approach 2: Multiclass Classification

While sklearn.metrics.confusion_matrix provides a numeric matrix, you can generate a 'report' using the following:
```
import pandas as pd
y_true = pd.Series([2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2])
y_pred = pd.Series([0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2])

pd.crosstab(y_true, y_pred, rownames=['True'], colnames=['Predicted'], margins=True)
```
which results in:
```
Predicted  0  1  2  All
True                   
0          3  0  0    3
1          0  1  2    3
2          2  1  3    6
All        5  2  5   12
```
This allows us to see that:
1. The diagonal elements show the number of correct classifications for each class: 3, 1 and 3 for the classes 0, 1 and 2.
2. The off-diagonal elements provides the misclassifications: for example, 2 of the class 2 were misclassified as 0, none of the class 0 were misclassified as 2, etc.
3. The total number of classifications for each class in both y_true and y_pred, from the "All" subtotals
This method also works for text labels, and for a large number of samples in the dataset can be extended to provide percentage reports.
0 讨论(0)
发布评论:

提交评论
- 加载中...
难免孤独

2021-01-19 05:59

Terms like true positive,false positive, etc. refer to binary classification. Whereas the dimensionality of your confusion matrix is greater then two. So you can talk only about the number of observations known to be in group i but predicted to be in group j (definition of confusion matrix).

0 讨论(0)
发布评论:

提交评论
- 加载中...