Calculate sklearn.roc_auc_score for multi-class

问题

I would like to calculate AUC, precision, accuracy for my classifier. I am doing supervised learning:

Here is my working code. This code is working fine for binary class, but not for multi class. Please assume that you have a dataframe with binary classes:

sample_features_dataframe = self._get_sample_features_dataframe()
labeled_sample_features_dataframe = retrieve_labeled_sample_dataframe(sample_features_dataframe)
labeled_sample_features_dataframe, binary_class_series, multi_class_series = self._prepare_dataframe_for_learning(labeled_sample_features_dataframe)

k = 10
k_folds = StratifiedKFold(binary_class_series, k)
for train_indexes, test_indexes in k_folds:
    train_set_dataframe = labeled_sample_features_dataframe.loc[train_indexes.tolist()]
    test_set_dataframe = labeled_sample_features_dataframe.loc[test_indexes.tolist()]

    train_class = binary_class_series[train_indexes]
    test_class = binary_class_series[test_indexes]
    selected_classifier = RandomForestClassifier(n_estimators=100)
    selected_classifier.fit(train_set_dataframe, train_class)
    predictions = selected_classifier.predict(test_set_dataframe)
    predictions_proba = selected_classifier.predict_proba(test_set_dataframe)

    roc += roc_auc_score(test_class, predictions_proba[:,1])
    accuracy += accuracy_score(test_class, predictions)
    recall += recall_score(test_class, predictions)
    precision += precision_score(test_class, predictions)

In the end I divided the results in K of course for getting average AUC, precision, etc. This code is working fine. However, I cannot calculate the same for multi class:

    train_class = multi_class_series[train_indexes]
    test_class = multi_class_series[test_indexes]

    selected_classifier = RandomForestClassifier(n_estimators=100)
    selected_classifier.fit(train_set_dataframe, train_class)

    predictions = selected_classifier.predict(test_set_dataframe)
    predictions_proba = selected_classifier.predict_proba(test_set_dataframe)

I found that for multi class I have to add the parameter "weighted" for average.

    roc += roc_auc_score(test_class, predictions_proba[:,1], average="weighted")

I got an error: raise ValueError("{0} format is not supported".format(y_type))

ValueError: multiclass format is not supported

回答1:

The average option of roc_auc_score is only defined for multilabel problems.

You can take a look at the following example from the scikit-learn documentation to define you own micro- or macro-averaged scores for multiclass problems:

http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#multiclass-settings

Edit: there is an issue on the scikit-learn tracker to implement ROC AUC for multiclass problems: https://github.com/scikit-learn/scikit-learn/issues/3298

回答2:

You can't use roc_auc as a single summary metric for multiclass models. If you want, you could calculate per-class roc_auc, as

roc = {label: [] for label in multi_class_series.unique()}
for label in multi_class_series.unique():
    selected_classifier.fit(train_set_dataframe, train_class == label)
    predictions_proba = selected_classifier.predict_proba(test_set_dataframe)
    roc[label] += roc_auc_score(test_class, predictions_proba[:,1])

However it's more usual to use sklearn.metrics.confusion_matrix to evaluate the performance of a multiclass model.

回答3:

As mentioned in here, to the best of my knowledge there is not yet a way to easily compute roc auc for multiple class settings natively in sklearn.

However, if you are familiar with classification_report you may like this simple implementation that returns the same output as classification_report as a pandas.DataFrame which I personally found it very handy!:

import pandas as pd
import numpy as np
from scipy import interp

from  sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import roc_curve, auc
from sklearn.preprocessing import LabelBinarizer

def class_report(y_true, y_pred, y_score=None, average='micro'):
    if y_true.shape != y_pred.shape:
        print("Error! y_true %s is not the same shape as y_pred %s" % (
              y_true.shape,
              y_pred.shape)
        )
        return

    lb = LabelBinarizer()

    if len(y_true.shape) == 1:
        lb.fit(y_true)

    #Value counts of predictions
    labels, cnt = np.unique(
        y_pred,
        return_counts=True)
    n_classes = len(labels)
    pred_cnt = pd.Series(cnt, index=labels)

    metrics_summary = precision_recall_fscore_support(
            y_true=y_true,
            y_pred=y_pred,
            labels=labels)

    avg = list(precision_recall_fscore_support(
            y_true=y_true, 
            y_pred=y_pred,
            average='weighted'))

    metrics_sum_index = ['precision', 'recall', 'f1-score', 'support']
    class_report_df = pd.DataFrame(
        list(metrics_summary),
        index=metrics_sum_index,
        columns=labels)

    support = class_report_df.loc['support']
    total = support.sum() 
    class_report_df['avg / total'] = avg[:-1] + [total]

    class_report_df = class_report_df.T
    class_report_df['pred'] = pred_cnt
    class_report_df['pred'].iloc[-1] = total

    if not (y_score is None):
        fpr = dict()
        tpr = dict()
        roc_auc = dict()
        for label_it, label in enumerate(labels):
            fpr[label], tpr[label], _ = roc_curve(
                (y_true == label).astype(int), 
                y_score[:, label_it])

            roc_auc[label] = auc(fpr[label], tpr[label])

        if average == 'micro':
            if n_classes <= 2:
                fpr["avg / total"], tpr["avg / total"], _ = roc_curve(
                    lb.transform(y_true).ravel(), 
                    y_score[:, 1].ravel())
            else:
                fpr["avg / total"], tpr["avg / total"], _ = roc_curve(
                        lb.transform(y_true).ravel(), 
                        y_score.ravel())

            roc_auc["avg / total"] = auc(
                fpr["avg / total"], 
                tpr["avg / total"])

        elif average == 'macro':
            # First aggregate all false positive rates
            all_fpr = np.unique(np.concatenate([
                fpr[i] for i in labels]
            ))

            # Then interpolate all ROC curves at this points
            mean_tpr = np.zeros_like(all_fpr)
            for i in labels:
                mean_tpr += interp(all_fpr, fpr[i], tpr[i])

            # Finally average it and compute AUC
            mean_tpr /= n_classes

            fpr["macro"] = all_fpr
            tpr["macro"] = mean_tpr

            roc_auc["avg / total"] = auc(fpr["macro"], tpr["macro"])

        class_report_df['AUC'] = pd.Series(roc_auc)

    return class_report_df

Here is some example:

from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=5000, n_features=10,
                           n_informative=5, n_redundant=0,
                           n_classes=10, random_state=0, 
                           shuffle=False)

X_train, X_test, y_train, y_test = train_test_split(X, y)

model = RandomForestClassifier(max_depth=2, random_state=0)
model.fit(X_train, y_train)

Regular classification_report:

sk_report = classification_report(
    digits=6,
    y_true=y_test, 
    y_pred=model.predict(X_test))
print(sk_report)

Out:

             precision    recall  f1-score   support

          0   0.262774  0.553846  0.356436       130
          1   0.405405  0.333333  0.365854       135
          2   0.367347  0.150000  0.213018       120
          3   0.350993  0.424000  0.384058       125
          4   0.379310  0.447154  0.410448       123
          5   0.525000  0.182609  0.270968       115
          6   0.362573  0.488189  0.416107       127
          7   0.330189  0.299145  0.313901       117
          8   0.328571  0.407080  0.363636       113
          9   0.571429  0.248276  0.346154       145

avg / total   0.390833  0.354400  0.345438      1250

Custom classification_report:

report_with_auc = class_report(
    y_true=y_test, 
    y_pred=model.predict(X_test), 
    y_score=model.predict_proba(X_test))

print(report_with_auc)

Out:

             precision    recall  f1-score  support    pred       AUC
0             0.262774  0.553846  0.356436    130.0   274.0  0.766477
1             0.405405  0.333333  0.365854    135.0   111.0  0.773974
2             0.367347  0.150000  0.213018    120.0    49.0  0.817341
3             0.350993  0.424000  0.384058    125.0   151.0  0.803364
4             0.379310  0.447154  0.410448    123.0   145.0  0.802436
5             0.525000  0.182609  0.270968    115.0    40.0  0.680870
6             0.362573  0.488189  0.416107    127.0   171.0  0.855768
7             0.330189  0.299145  0.313901    117.0   106.0  0.766526
8             0.328571  0.407080  0.363636    113.0   140.0  0.754812
9             0.571429  0.248276  0.346154    145.0    63.0  0.769100
avg / total   0.390833  0.354400  0.345438   1250.0  1250.0  0.776071

回答4:

If you are looking for something relatively simple that takes in the actual and predicted lists and returns a dictionary with all the classes as keys and its roc_auc_score as values, you can use the following method:

from sklearn.metrics import roc_auc_score

def roc_auc_score_multiclass(actual_class, pred_class, average = "macro"):

  #creating a set of all the unique classes using the actual class list
  unique_class = set(actual_class)
  roc_auc_dict = {}
  for per_class in unique_class:
    #creating a list of all the classes except the current class 
    other_class = [x for x in unique_class if x != per_class]

    #marking the current class as 1 and all other classes as 0
    new_actual_class = [0 if x in other_class else 1 for x in actual_class]
    new_pred_class = [0 if x in other_class else 1 for x in pred_class]

    #using the sklearn metrics method to calculate the roc_auc_score
    roc_auc = roc_auc_score(new_actual_class, new_pred_class, average = average)
    roc_auc_dict[per_class] = roc_auc

  return roc_auc_dict

print("\nLogistic Regression")
# assuming your already have a list of actual_class and predicted_class from the logistic regression classifier
lr_roc_auc_multiclass = roc_auc_score_multiclass(actual_class, predicted_class)
print(lr_roc_auc_multiclass)

# Sample output
# Logistic Regression
# {0: 0.5087457159427196, 1: 0.5, 2: 0.5, 3: 0.5114706737345112, 4: 0.5192307692307693}
# 0.5078894317816

回答5:

Updating on maxymoo's answer.

roc[label] += roc_auc_score(test_class, predictions_proba[:,label])

or refer to classifier.classes_ attribute to decide the right column for interested label.

回答6:

@Raul your function looks good but there is a problem in the function when it calculates the roc_score for micro average with n_classes<=2. I was having issues with the dimensions so I changed the following:

from this

if average == 'micro':
        if n_classes <= 2:
            fpr["avg / total"], tpr["avg / total"], _ = roc_curve(
                lb.transform(y_true).ravel(), 
                **y_score[:, 1]**.ravel())

to this

if average == 'micro':
        if n_classes <= 2:
            fpr["avg / total"], tpr["avg / total"], _ = roc_curve(
                lb.transform(y_true).ravel(), 
                **y_score**.ravel())

I hope this change does not create problems in the calculation of roc_score.

来源：https://stackoverflow.com/questions/39685740/calculate-sklearn-roc-auc-score-for-multi-class

标签

python

scikit-learn

supervised-learning