How to plot ROC curve in Python

后端 未结 11 2113
臣服心动
臣服心动 2020-11-29 16:15

I am trying to plot a ROC curve to evaluate the accuracy of a prediction model I developed in Python using logistic regression packages. I have computed the true positive ra

相关标签:
11条回答
  • 2020-11-29 16:42

    Here are two ways you may try, assuming your model is an sklearn predictor:

    import sklearn.metrics as metrics
    # calculate the fpr and tpr for all thresholds of the classification
    probs = model.predict_proba(X_test)
    preds = probs[:,1]
    fpr, tpr, threshold = metrics.roc_curve(y_test, preds)
    roc_auc = metrics.auc(fpr, tpr)
    
    # method I: plt
    import matplotlib.pyplot as plt
    plt.title('Receiver Operating Characteristic')
    plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)
    plt.legend(loc = 'lower right')
    plt.plot([0, 1], [0, 1],'r--')
    plt.xlim([0, 1])
    plt.ylim([0, 1])
    plt.ylabel('True Positive Rate')
    plt.xlabel('False Positive Rate')
    plt.show()
    
    # method II: ggplot
    from ggplot import *
    df = pd.DataFrame(dict(fpr = fpr, tpr = tpr))
    ggplot(df, aes(x = 'fpr', y = 'tpr')) + geom_line() + geom_abline(linetype = 'dashed')
    

    or try

    ggplot(df, aes(x = 'fpr', ymin = 0, ymax = 'tpr')) + geom_line(aes(y = 'tpr')) + geom_area(alpha = 0.2) + ggtitle("ROC Curve w/ AUC = %s" % str(roc_auc)) 
    
    0 讨论(0)
  • 2020-11-29 16:47

    Here is python code for computing the ROC curve (as a scatter plot):

    import matplotlib.pyplot as plt
    import numpy as np
    
    score = np.array([0.9, 0.8, 0.7, 0.6, 0.55, 0.54, 0.53, 0.52, 0.51, 0.505, 0.4, 0.39, 0.38, 0.37, 0.36, 0.35, 0.34, 0.33, 0.30, 0.1])
    y = np.array([1,1,0, 1, 1, 1, 0, 0, 1, 0, 1,0, 1, 0, 0, 0, 1 , 0, 1, 0])
    
    # false positive rate
    fpr = []
    # true positive rate
    tpr = []
    # Iterate thresholds from 0.0, 0.01, ... 1.0
    thresholds = np.arange(0.0, 1.01, .01)
    
    # get number of positive and negative examples in the dataset
    P = sum(y)
    N = len(y) - P
    
    # iterate through all thresholds and determine fraction of true positives
    # and false positives found at this threshold
    for thresh in thresholds:
        FP=0
        TP=0
        for i in range(len(score)):
            if (score[i] > thresh):
                if y[i] == 1:
                    TP = TP + 1
                if y[i] == 0:
                    FP = FP + 1
        fpr.append(FP/float(N))
        tpr.append(TP/float(P))
    
    plt.scatter(fpr, tpr)
    plt.show()
    
    0 讨论(0)
  • 2020-11-29 16:47

    You can also follow the offical documentation form scikit:

    https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#sphx-glr-auto-examples-model-selection-plot-roc-py

    0 讨论(0)
  • 2020-11-29 16:48

    AUC curve For Binary Classification using matplotlib

    from sklearn import svm, datasets
    from sklearn import metrics
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import train_test_split
    from sklearn.datasets import load_breast_cancer
    import matplotlib.pyplot as plt
    

    Load Breast Cancer Dataset

    breast_cancer = load_breast_cancer()
    
    X = breast_cancer.data
    y = breast_cancer.target
    

    Split the Dataset

    X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.33, random_state=44)
    

    Model

    clf = LogisticRegression(penalty='l2', C=0.1)
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    

    Accuracy

    print("Accuracy", metrics.accuracy_score(y_test, y_pred))
    

    AUC Curve

    y_pred_proba = clf.predict_proba(X_test)[::,1]
    fpr, tpr, _ = metrics.roc_curve(y_test,  y_pred_proba)
    auc = metrics.roc_auc_score(y_test, y_pred_proba)
    plt.plot(fpr,tpr,label="data 1, auc="+str(auc))
    plt.legend(loc=4)
    plt.show()
    

    0 讨论(0)
  • 2020-11-29 16:48

    There is a library called metriculous that will do that for you:

    $ pip install metriculous
    

    Let's first mock some data, this would usually come from the test dataset and the model(s):

    import numpy as np
    
    def normalize(array2d: np.ndarray) -> np.ndarray:
        return array2d / array2d.sum(axis=1, keepdims=True)
    
    class_names = ["Cat", "Dog", "Pig"]
    num_classes = len(class_names)
    num_samples = 500
    
    # Mock ground truth
    ground_truth = np.random.choice(range(num_classes), size=num_samples, p=[0.5, 0.4, 0.1])
    
    # Mock model predictions
    perfect_model = np.eye(num_classes)[ground_truth]
    noisy_model = normalize(
        perfect_model + 2 * np.random.random((num_samples, num_classes))
    )
    random_model = normalize(np.random.random((num_samples, num_classes)))
    

    Now we can use metriculous to generate a table with various metrics and diagrams, including ROC curves:

    import metriculous
    
    metriculous.compare_classifiers(
        ground_truth=ground_truth,
        model_predictions=[perfect_model, noisy_model, random_model],
        model_names=["Perfect Model", "Noisy Model", "Random Model"],
        class_names=class_names,
        one_vs_all_figures=True, # This line is important to include ROC curves in the output
    ).save_html("model_comparison.html").display()
    

    The ROC curves in the output:

    The plots are zoomable and draggable, and you get further details when hovering with your mouse over the plot:

    0 讨论(0)
提交回复
热议问题