Bootstrap by groups with Boot package

偶尔善良 提交于 2019-12-13 05:39:49

问题


I have a "my.dataset" like this:

   ID    Species  SEX     Category     V1      V2     V3
87790   Caniceps    F   F_Caniceps  -0.34   -0.55   0.61
199486  Caniceps    F   F_Caniceps  -0.34   -0.56   0.63
199490  Caniceps    F   F_Caniceps  -0.37   -0.54   0.57
199493  Caniceps    F   F_Caniceps  -0.35   -0.54   0.58
200139  Caniceps    F   F_Caniceps  -0.39   -0.51   0.51
393151  Caniceps    M   M_Caniceps  -0.36   -0.56   0.55
393154  Caniceps    M   M_Caniceps  -0.36   -0.55   0.55
486210  Caniceps    M   M_Caniceps  -0.41   -0.50   0.45
811945  Hyemalis    F   F_Hyemalis  -0.35   -0.54   0.55
811947  Hyemalis    F   F_Hyemalis  -0.35   -0.59   0.62
 15661  Hyemalis    M   M_Hyemalis  -0.34   -0.56   0.62
 15662  Hyemalis    M   M_Hyemalis  -0.35   -0.53   0.53
 15663  Hyemalis    M   M_Hyemalis  -0.33   -0.58   0.68
 15664  Vulcani     F   F_Vulcani   -0.29   -0.57   0.71
 15665  Vulcani     F   F_Vulcani   -0.29   -0.56   0.67
 15666  Vulcani     F   F_Vulcani   -0.28   -0.55   0.70
486218  Vulcani     F   F_Vulcani   -0.36   -0.55   0.56
486224  Vulcani     F   F_Vulcani   -0.36   -0.54   0.56
486212  Vulcani     M   M_Vulcani   -0.37   -0.53   0.53
486213  Vulcani     M   M_Vulcani   -0.37   -0.53   0.54
199479  Vulcani     M   M_Vulcani   -0.33   -0.57   0.61
199483  Vulcani     M   M_Vulcani   -0.33   -0.62   0.69
199484  Vulcani     M   M_Vulcani   -0.33   -0.60   0.65

I'm trying to perform a bootstrap with boot() to compute a statistic over variables "V1", "V2" and "V3", something like:

boot(my.dataset, statistic=lda (formula=lda(SEX~V1+V2+V3, data=my.dataset), R=3, sim = "ordinary")

But I need the resampling to take the same number of individuals depending on "Category" variable of "my.dataset". Any idea about how to do this?


回答1:


You are looking for the "strata" argument of the bootstrap. This is called a stratified bootstrap. Remark: i'm not sure that your boot code is correct, i would suggest something like:

   statfun = function(d, i) {lda(formula=SEX~V1+V2+V3, data=d[i, ])}
res <- boot(my.dataset, statfun, R=100, strata=factor(my.dataset$Species))

I don't know what the lda() function returns, but the statfunction must return a value or a vector for the bootstrap to work properly.

This method ensures that every level of the factor gets choosen proportionnaly to its number of observations. In the normal bootstrap, this is not the case and causes errors since some levels are missing in some replications and the linear model cannot be computed.

Note: in the strata argument, you have to specify again the name of the dataframe



来源:https://stackoverflow.com/questions/29028727/bootstrap-by-groups-with-boot-package

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!