发表新帖

发表新帖

xgboost: which parameters are used in the linear booster gblinear?

后端未结

关注

 1  1921

离开以前 2021-01-22 11:54

Looking on the web I am still a confused about what the linear booster gblinear precisely is and I am not alone.

Following the documentation it only has 3 p

1条回答

夕颜 (楼主)

2021-01-22 12:34

I may as well do some squats between running the gblinear, observe the results change almost every time, and claim that doing squats has an impact on the algorithm :)

In all seriousness, the algorithm that gblinear currently uses is not your "rather standard linear boosting". The thing responsible for the stochasticity is the use of lock-free parallelization ('hogwild') while updating the gradients during each iteration. Setting the seed doesn't affect anything; and you would only get consistently reproducible results when running single-threaded (nthread=1). I would also advise against running it with the default nthread setting which uses maximum possible number of OpenMP threads, as on many systems it would result in much slower speed due to thread congestion. The nthread needs to be not higher than the number of physical cores.

This free stochasticity might improve predictive performance in some situations. However, the pros frequently don't outweigh cons. At some point, I will submit a pull request with an option for deterministic parallelization and an option for some additional control over feature selection at each boosting round.

For ground truth on all the available parameters that are specific to booster training, refer to the source of struct GBLinearTrainParam for gblinear and to the source of struct TrainParam for gbtree.

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题