XGBoost - Poisson distribution with varying exposure / offset

前端未结

关注

 2  895

I am trying to use XGBoost to model claims frequency of data generated from unequal length exposure periods, but have been unable to get the model to treat the exposure corr

相关标签:

2条回答

离开以前

2020-12-31 18:55
I have now worked out how to do this using setinfo to change the base_margin attribute to be the offset (as a linear predictor), ie:
```
setinfo(xgtrain, "base_margin", log(d$exposure))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
臣服心动

2020-12-31 19:07

At least with the glm function in R, modeling count ~ x1 + x2 + offset(log(exposure)) with family=poisson(link='log') is equivalent to modeling I(count/exposure) ~ x1 + x2 with family=poisson(link='log') and weight=exposure. That is, normalize your count by exposure to get frequency, and model frequency with exposure as the weight. Your estimated coefficients should be the same in both cases when using glm for Poisson regression. Try it for yourself using a sample data set

I'm not exactly sure what objective='count:poisson' corresponds to, but I would expect setting your target variable as frequency (count/exposure) and using exposure as the weight in xgboost would be the way to go when exposures are varying.

0 讨论(0)
发布评论:

提交评论
- 加载中...