I\'m trying to make a classifier on a data set. I first used XGBoost:
import xgboost as xgb
import pandas as pd
import numpy as np
train = pd.read_csv(\"tra
This question is a bit old, but I ran into the problem today and figured out why the results given by xgboost.cv
and sklearn.model_selection.cross_val_score
are quite different.
By default cross_val_score use KFold
or StratifiedKFold
whose shuffle argument is False so the folds are not pulled randomly from the data.
So if you do this, then you should get the same results:
cross_val_score(estimator, X=train_features, y=train_labels, scoring="neg_log_loss",
cv = StratifiedKFold(shuffle=True, random_state=23333))
Keep the random state
in StratifiedKfold
and seed
in xgboost.cv
same to get exactly reproducible results.