XGBoost - Poisson distribution with varying exposure / offset

前端 未结 2 884
無奈伤痛
無奈伤痛 2020-12-31 18:25

I am trying to use XGBoost to model claims frequency of data generated from unequal length exposure periods, but have been unable to get the model to treat the exposure corr

相关标签:
2条回答
  • 2020-12-31 18:55

    I have now worked out how to do this using setinfo to change the base_margin attribute to be the offset (as a linear predictor), ie:

    setinfo(xgtrain, "base_margin", log(d$exposure))
    
    0 讨论(0)
  • 2020-12-31 19:07

    At least with the glm function in R, modeling count ~ x1 + x2 + offset(log(exposure)) with family=poisson(link='log') is equivalent to modeling I(count/exposure) ~ x1 + x2 with family=poisson(link='log') and weight=exposure. That is, normalize your count by exposure to get frequency, and model frequency with exposure as the weight. Your estimated coefficients should be the same in both cases when using glm for Poisson regression. Try it for yourself using a sample data set

    I'm not exactly sure what objective='count:poisson' corresponds to, but I would expect setting your target variable as frequency (count/exposure) and using exposure as the weight in xgboost would be the way to go when exposures are varying.

    0 讨论(0)
提交回复
热议问题