I am running a regression using the XGBoost Algorithm as,
clf = XGBRegressor(eval_set = [(X_train, y_train), (X_val, y_val)],
early_st
For your TypeError: use get_booster() instead of booster()
print("Best Iteration: {}".format(clf.get_booster().best_iteration))
To use the number of the best iteration when you predict, you have a parameter called ntree_limit
which specify the number of boosters to use. And the value generated from the training process is best_ntree_limit
which can be called after training your model in the following matter: clg.get_booster().best_ntree_limit
. More specifically when you predict, use:
best_iteration = clg.get_booster().best_ntree_limit
predict(data, ntree_limit=best_iteration)
You can print your training and evaluating process if you specify those parameters in the .fit() command
clf.fit(X_train, y_train,
eval_set = [(X_train, y_train), (X_val, y_val)],
eval_metric = 'rmse',
early_stopping_rounds = 10, verbose=True)
NOTE: early_stopping_rounds parameter should be in the .fit()
command not in the XGBRegressor()
instantiation.
Another NOTE: verbose = 50
in XGBRegressor()
is redundant. The verbose
variable should be in your .fit()
function and is True or False. For what the verbose=True do, read here under the verbose section. It is directly affects your 3rd question.
Your error is that the booster
attribute of XGBRegressor is a string that specifies the kind of booster to use, not the actual booster instance. From the docs:
booster: string
Specify which booster to use: gbtree, gblinear or dart.
In order to get the actual booster, you can call get_booster()
instead:
>>> clf.booster
'gbtree'
>>> clf.get_booster()
<xgboost.core.Booster object at 0x118c40cf8>
>>> clf.get_booster().best_iteration
9
>>> print("Best Iteration: {}".format(clf.get_booster().best_iteration))
Best Iteration: 9
I'm not sure about the second half of your question, namely:
Furthermore, how can I print the training error of ** each round**?
but hopefully you're unblocked!