ROC curve with Leave-One-Out Cross validation in sklearn

前端 未结 1 401
梦如初夏
梦如初夏 2021-01-22 17:13

I want to plot a ROC curve of a classifier using leave-one-out cross validation.

It seems that a similar question has been asked here b

相关标签:
1条回答
  • 2021-01-22 17:23

    I believe the code is correct and the splitting too. I've added a few lines for validation purposes of both the implementation and the results:

    from sklearn.model_selection import cross_val_score,cross_val_predict,  KFold,  LeaveOneOut, StratifiedKFold
    from sklearn.metrics import roc_curve, auc
    from sklearn import datasets
    
    # Import some data to play with
    iris = datasets.load_iris()
    X_svc = iris.data
    y = iris.target
    X_svc, y = X_svc[y != 2], y[y != 2]
    
    clf = SVC(kernel='linear', class_weight='balanced', probability=True, random_state=0)
    kf = LeaveOneOut()
    if kf.get_n_splits(X_svc) == len(X_svc):
        print("They are the same length, splitting correct")
    else:
        print("Something is wrong")
    all_y = []
    all_probs=[]
    for train, test in kf.split(X_svc, y):
        all_y.append(y[test])
        all_probs.append(clf.fit(X_svc[train], y[train]).predict_proba(X_svc[test])[:,1])
    all_y = np.array(all_y)
    all_probs = np.array(all_probs)
    #print(all_y) #For validation 
    #print(all_probs) #For validation
    
    fpr, tpr, thresholds = roc_curve(all_y,all_probs)
    print(fpr, tpr, thresholds) #For validation
    roc_auc = auc(fpr, tpr)
    plt.figure(1, figsize=(12,6))
    plt.plot(fpr, tpr, lw=2, alpha=0.5, label='LOOCV ROC (AUC = %0.2f)' % (roc_auc))
    plt.plot([0, 1], [0, 1], linestyle='--', lw=2, color='k', label='Chance level', alpha=.8)
    plt.xlim([-0.05, 1.05])
    plt.ylim([-0.05, 1.05])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Receiver operating characteristic example')
    plt.legend(loc="lower right")
    plt.grid()
    plt.show()
    

    The If line is meant to only make sure that the splitting is made n times, where n is the number of observations for the given dataset. This is because as the documentation states, LeaveOneOut works the same as Kfold(n_splits=n) and LeaveOneOut(p=1). Also when printing the predicted proba values they were good, making sense of the curve. Congratz on your 1.00AUC!

    0 讨论(0)
提交回复
热议问题