Why xgboost.cv and sklearn.cross_val_score give different results?

前端 未结 1 1443
花落未央
花落未央 2021-01-05 12:42

I\'m trying to make a classifier on a data set. I first used XGBoost:

import xgboost as xgb
import pandas as pd
import numpy as np

train = pd.read_csv(\"tra         


        
相关标签:
1条回答
  • 2021-01-05 13:05

    This question is a bit old, but I ran into the problem today and figured out why the results given by xgboost.cv and sklearn.model_selection.cross_val_score are quite different.

    By default cross_val_score use KFold or StratifiedKFold whose shuffle argument is False so the folds are not pulled randomly from the data.

    So if you do this, then you should get the same results:

    cross_val_score(estimator, X=train_features, y=train_labels, scoring="neg_log_loss",
        cv = StratifiedKFold(shuffle=True, random_state=23333))
    

    Keep the random state in StratifiedKfold and seed in xgboost.cv same to get exactly reproducible results.

    0 讨论(0)
提交回复
热议问题