How to get comparable and reproducible results from LogisticRegressionCV and GridSearchCV

后端 未结 1 1914
日久生厌
日久生厌 2021-02-05 22:18

I want to score different classifiers with different parameters.

For speedup on LogisticRegression I use LogisticRegressionCV (which at least 2x

相关标签:
1条回答
  • 2021-02-05 22:40

    Here is a copy of the answer by Tom on the scikit-learn issue tracker:

    LogisticRegressionCV.scores_ gives the score for all the folds. GridSearchCV.best_score_ gives the best mean score over all the folds.

    To get the same result, you need to change your code:

    print('Max auc_roc:', searchCV.scores_[1].max())  # is wrong
    print('Max auc_roc:', searchCV.scores_[1].mean(axis=0).max())  # is correct
    

    By also using the default tol=1e-4 instead of your tol=10, I get:

    ('gs.best_score_:', 0.939162082193857)
    ('Max auc_roc:', 0.93915947999923843)
    

    The (small) remaining difference might come from warm starting in LogisticRegressionCV (which is actually what makes it faster than GridSearchCV).

    0 讨论(0)
提交回复
热议问题