How to plot ROC curve in Python

后端 未结 11 2103
臣服心动
臣服心动 2020-11-29 16:15

I am trying to plot a ROC curve to evaluate the accuracy of a prediction model I developed in Python using logistic regression packages. I have computed the true positive ra

相关标签:
11条回答
  • 2020-11-29 16:21

    Based on multiple comments from stackoverflow, scikit-learn documentation and some other, I made a python package to plot ROC curve (and other metric) in a really simple way.

    To install package : pip install plot-metric (more info at the end of post)

    To plot a ROC Curve (example come from the documentation) :

    Binary classification

    Let's load a simple dataset and make a train & test set :

    from sklearn.datasets import make_classification
    from sklearn.model_selection import train_test_split
    X, y = make_classification(n_samples=1000, n_classes=2, weights=[1,1], random_state=1)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=2)
    

    Train a classifier and predict test set :

    from sklearn.ensemble import RandomForestClassifier
    clf = RandomForestClassifier(n_estimators=50, random_state=23)
    model = clf.fit(X_train, y_train)
    
    # Use predict_proba to predict probability of the class
    y_pred = clf.predict_proba(X_test)[:,1]
    

    You can now use plot_metric to plot ROC Curve :

    from plot_metric.functions import BinaryClassification
    # Visualisation with plot_metric
    bc = BinaryClassification(y_test, y_pred, labels=["Class 1", "Class 2"])
    
    # Figures
    plt.figure(figsize=(5,5))
    bc.plot_roc_curve()
    plt.show()
    

    Result :

    You can find more example of on the github and documentation of the package:

    • Github : https://github.com/yohann84L/plot_metric
    • Documentation : https://plot-metric.readthedocs.io/en/latest/
    0 讨论(0)
  • 2020-11-29 16:26

    It is not at all clear what the problem is here, but if you have an array true_positive_rate and an array false_positive_rate, then plotting the ROC curve and getting the AUC is as simple as:

    import matplotlib.pyplot as plt
    import numpy as np
    
    x = # false_positive_rate
    y = # true_positive_rate 
    
    # This is the ROC curve
    plt.plot(x,y)
    plt.show() 
    
    # This is the AUC
    auc = np.trapz(y,x)
    
    0 讨论(0)
  • 2020-11-29 16:27

    The previous answers assume that you indeed calculated TP/Sens yourself. It's a bad idea to do this manually, it's easy to make mistakes with the calculations, rather use a library function for all of this.

    the plot_roc function in scikit_lean does exactly what you need: http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html

    The essential part of the code is:

      for i in range(n_classes):
          fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
          roc_auc[i] = auc(fpr[i], tpr[i])
    
    0 讨论(0)
  • 2020-11-29 16:28

    This is the simplest way to plot an ROC curve, given a set of ground truth labels and predicted probabilities. Best part is, it plots the ROC curve for ALL classes, so you get multiple neat-looking curves as well

    import scikitplot as skplt
    import matplotlib.pyplot as plt
    
    y_true = # ground truth labels
    y_probas = # predicted probabilities generated by sklearn classifier
    skplt.metrics.plot_roc_curve(y_true, y_probas)
    plt.show()
    

    Here's a sample curve generated by plot_roc_curve. I used the sample digits dataset from scikit-learn so there are 10 classes. Notice that one ROC curve is plotted for each class.

    Disclaimer: Note that this uses the scikit-plot library, which I built.

    0 讨论(0)
  • 2020-11-29 16:32
    from sklearn import metrics
    import numpy as np
    import matplotlib.pyplot as plt
    
    y_true = # true labels
    y_probas = # predicted results
    fpr, tpr, thresholds = metrics.roc_curve(y_true, y_probas, pos_label=0)
    
    # Print ROC curve
    plt.plot(fpr,tpr)
    plt.show() 
    
    # Print AUC
    auc = np.trapz(tpr,fpr)
    print('AUC:', auc)
    
    0 讨论(0)
  • 2020-11-29 16:33

    I have made a simple function included in a package for the ROC curve. I just started practicing machine learning so please also let me know if this code has any problem!

    Have a look at the github readme file for more details! :)

    https://github.com/bc123456/ROC

    from sklearn.metrics import confusion_matrix, accuracy_score, roc_auc_score, roc_curve
    import matplotlib.pyplot as plt
    import seaborn as sns
    import numpy as np
    
    def plot_ROC(y_train_true, y_train_prob, y_test_true, y_test_prob):
        '''
        a funciton to plot the ROC curve for train labels and test labels.
        Use the best threshold found in train set to classify items in test set.
        '''
        fpr_train, tpr_train, thresholds_train = roc_curve(y_train_true, y_train_prob, pos_label =True)
        sum_sensitivity_specificity_train = tpr_train + (1-fpr_train)
        best_threshold_id_train = np.argmax(sum_sensitivity_specificity_train)
        best_threshold = thresholds_train[best_threshold_id_train]
        best_fpr_train = fpr_train[best_threshold_id_train]
        best_tpr_train = tpr_train[best_threshold_id_train]
        y_train = y_train_prob > best_threshold
    
        cm_train = confusion_matrix(y_train_true, y_train)
        acc_train = accuracy_score(y_train_true, y_train)
        auc_train = roc_auc_score(y_train_true, y_train)
    
        print 'Train Accuracy: %s ' %acc_train
        print 'Train AUC: %s ' %auc_train
        print 'Train Confusion Matrix:'
        print cm_train
    
        fig = plt.figure(figsize=(10,5))
        ax = fig.add_subplot(121)
        curve1 = ax.plot(fpr_train, tpr_train)
        curve2 = ax.plot([0, 1], [0, 1], color='navy', linestyle='--')
        dot = ax.plot(best_fpr_train, best_tpr_train, marker='o', color='black')
        ax.text(best_fpr_train, best_tpr_train, s = '(%.3f,%.3f)' %(best_fpr_train, best_tpr_train))
        plt.xlim([0.0, 1.0])
        plt.ylim([0.0, 1.0])
        plt.xlabel('False Positive Rate')
        plt.ylabel('True Positive Rate')
        plt.title('ROC curve (Train), AUC = %.4f'%auc_train)
    
        fpr_test, tpr_test, thresholds_test = roc_curve(y_test_true, y_test_prob, pos_label =True)
    
        y_test = y_test_prob > best_threshold
    
        cm_test = confusion_matrix(y_test_true, y_test)
        acc_test = accuracy_score(y_test_true, y_test)
        auc_test = roc_auc_score(y_test_true, y_test)
    
        print 'Test Accuracy: %s ' %acc_test
        print 'Test AUC: %s ' %auc_test
        print 'Test Confusion Matrix:'
        print cm_test
    
        tpr_score = float(cm_test[1][1])/(cm_test[1][1] + cm_test[1][0])
        fpr_score = float(cm_test[0][1])/(cm_test[0][0]+ cm_test[0][1])
    
        ax2 = fig.add_subplot(122)
        curve1 = ax2.plot(fpr_test, tpr_test)
        curve2 = ax2.plot([0, 1], [0, 1], color='navy', linestyle='--')
        dot = ax2.plot(fpr_score, tpr_score, marker='o', color='black')
        ax2.text(fpr_score, tpr_score, s = '(%.3f,%.3f)' %(fpr_score, tpr_score))
        plt.xlim([0.0, 1.0])
        plt.ylim([0.0, 1.0])
        plt.xlabel('False Positive Rate')
        plt.ylabel('True Positive Rate')
        plt.title('ROC curve (Test), AUC = %.4f'%auc_test)
        plt.savefig('ROC', dpi = 500)
        plt.show()
    
        return best_threshold
    

    A sample roc graph produced by this code

    0 讨论(0)
提交回复
热议问题