xgboost: which parameters are used in the linear booster gblinear?

后端 未结 1 1921
离开以前
离开以前 2021-01-22 11:54

Looking on the web I am still a confused about what the linear booster gblinear precisely is and I am not alone.

Following the documentation it only has 3 p

1条回答
  •  夕颜
    夕颜 (楼主)
    2021-01-22 12:34

    I may as well do some squats between running the gblinear, observe the results change almost every time, and claim that doing squats has an impact on the algorithm :)

    In all seriousness, the algorithm that gblinear currently uses is not your "rather standard linear boosting". The thing responsible for the stochasticity is the use of lock-free parallelization ('hogwild') while updating the gradients during each iteration. Setting the seed doesn't affect anything; and you would only get consistently reproducible results when running single-threaded (nthread=1). I would also advise against running it with the default nthread setting which uses maximum possible number of OpenMP threads, as on many systems it would result in much slower speed due to thread congestion. The nthread needs to be not higher than the number of physical cores.

    This free stochasticity might improve predictive performance in some situations. However, the pros frequently don't outweigh cons. At some point, I will submit a pull request with an option for deterministic parallelization and an option for some additional control over feature selection at each boosting round.

    For ground truth on all the available parameters that are specific to booster training, refer to the source of struct GBLinearTrainParam for gblinear and to the source of struct TrainParam for gbtree.

    0 讨论(0)
提交回复
热议问题