My problem:
I have a dataset which is a large JSON file. I read it and store it in the trainList
variable.
Next, I pre-process
For the multi-class case, everything you need can be found from the confusion matrix. For example, if your confusion matrix looks like this:
Then what you're looking for, per class, can be found like this:
Using pandas/numpy, you can do this for all classes at once like so:
FP = confusion_matrix.sum(axis=0) - np.diag(confusion_matrix)
FN = confusion_matrix.sum(axis=1) - np.diag(confusion_matrix)
TP = np.diag(confusion_matrix)
TN = confusion_matrix.values.sum() - (FP + FN + TP)
# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)
# Specificity or true negative rate
TNR = TN/(TN+FP)
# Precision or positive predictive value
PPV = TP/(TP+FP)
# Negative predictive value
NPV = TN/(TN+FN)
# Fall out or false positive rate
FPR = FP/(FP+TN)
# False negative rate
FNR = FN/(TP+FN)
# False discovery rate
FDR = FP/(TP+FP)
# Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)
You can obtain all of the parameters from the confusion matrix. The structure of the confusion matrix(which is 2X2 matrix) is as follows (assuming the first index is related to the positive label, and the rows are related to the true labels):
TP|FN
FP|TN
So
TP = cm[0][0]
FN = cm[0][1]
FP = cm[1][0]
TN = cm[1][1]
More details at https://en.wikipedia.org/wiki/Confusion_matrix
In scikit version 0.22, you can do it like this
from sklearn.metrics import multilabel_confusion_matrix
y_true = ["cat", "ant", "cat", "cat", "ant", "bird"]
y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"]
mcm = multilabel_confusion_matrix(y_true, y_pred,labels=["ant", "bird", "cat"])
tn = mcm[:, 0, 0]
tp = mcm[:, 1, 1]
fn = mcm[:, 1, 0]
fp = mcm[:, 0, 1]
I think both of the answers are not fully correct. For example, suppose that we have the following arrays;
y_actual = [1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0]
y_predic = [1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0]
If we compute the FP, FN, TP and TN values manually, they should be as follows:
FP: 3 FN: 1 TP: 3 TN: 4
However, if we use the first answer, results are given as follows:
FP: 1 FN: 3 TP: 3 TN: 4
They are not correct, because in the first answer, False Positive should be where actual is 0, but the predicted is 1, not the opposite. It is also same for False Negative.
And, if we use the second answer, the results are computed as follows:
FP: 3 FN: 1 TP: 4 TN: 3
True Positive and True Negative numbers are not correct, they should be opposite.
Am I correct with my computations? Please let me know if I am missing something.
you can try sklearn.metrics.classification_report
as below:
import sklearn
y_true = [1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0]
y_pred = [1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0]
print sklearn.metrics.classification_report(y_true, y_pred)
output:
precision recall f1-score support
0 0.80 0.57 0.67 7
1 0.50 0.75 0.60 4
avg / total 0.69 0.64 0.64 11
Just in case some is looking for the same in MULTI-CLASS Example
def perf_measure(y_actual, y_pred):
class_id = set(y_actual).union(set(y_pred))
TP = []
FP = []
TN = []
FN = []
for index ,_id in enumerate(class_id):
TP.append(0)
FP.append(0)
TN.append(0)
FN.append(0)
for i in range(len(y_pred)):
if y_actual[i] == y_pred[i] == _id:
TP[index] += 1
if y_pred[i] == _id and y_actual[i] != y_pred[i]:
FP[index] += 1
if y_actual[i] == y_pred[i] != _id:
TN[index] += 1
if y_pred[i] != _id and y_actual[i] != y_pred[i]:
FN[index] += 1
return class_id,TP, FP, TN, FN