问题
I have a "my.dataset" like this:
ID Species SEX Category V1 V2 V3
87790 Caniceps F F_Caniceps -0.34 -0.55 0.61
199486 Caniceps F F_Caniceps -0.34 -0.56 0.63
199490 Caniceps F F_Caniceps -0.37 -0.54 0.57
199493 Caniceps F F_Caniceps -0.35 -0.54 0.58
200139 Caniceps F F_Caniceps -0.39 -0.51 0.51
393151 Caniceps M M_Caniceps -0.36 -0.56 0.55
393154 Caniceps M M_Caniceps -0.36 -0.55 0.55
486210 Caniceps M M_Caniceps -0.41 -0.50 0.45
811945 Hyemalis F F_Hyemalis -0.35 -0.54 0.55
811947 Hyemalis F F_Hyemalis -0.35 -0.59 0.62
15661 Hyemalis M M_Hyemalis -0.34 -0.56 0.62
15662 Hyemalis M M_Hyemalis -0.35 -0.53 0.53
15663 Hyemalis M M_Hyemalis -0.33 -0.58 0.68
15664 Vulcani F F_Vulcani -0.29 -0.57 0.71
15665 Vulcani F F_Vulcani -0.29 -0.56 0.67
15666 Vulcani F F_Vulcani -0.28 -0.55 0.70
486218 Vulcani F F_Vulcani -0.36 -0.55 0.56
486224 Vulcani F F_Vulcani -0.36 -0.54 0.56
486212 Vulcani M M_Vulcani -0.37 -0.53 0.53
486213 Vulcani M M_Vulcani -0.37 -0.53 0.54
199479 Vulcani M M_Vulcani -0.33 -0.57 0.61
199483 Vulcani M M_Vulcani -0.33 -0.62 0.69
199484 Vulcani M M_Vulcani -0.33 -0.60 0.65
I'm trying to perform a bootstrap with boot()
to compute a statistic over variables "V1", "V2" and "V3", something like:
boot(my.dataset, statistic=lda (formula=lda(SEX~V1+V2+V3, data=my.dataset), R=3, sim = "ordinary")
But I need the resampling to take the same number of individuals depending on "Category" variable of "my.dataset". Any idea about how to do this?
回答1:
You are looking for the "strata" argument of the bootstrap. This is called a stratified bootstrap. Remark: i'm not sure that your boot code is correct, i would suggest something like:
statfun = function(d, i) {lda(formula=SEX~V1+V2+V3, data=d[i, ])}
res <- boot(my.dataset, statfun, R=100, strata=factor(my.dataset$Species))
I don't know what the lda() function returns, but the statfunction must return a value or a vector for the bootstrap to work properly.
This method ensures that every level of the factor gets choosen proportionnaly to its number of observations. In the normal bootstrap, this is not the case and causes errors since some levels are missing in some replications and the linear model cannot be computed.
Note: in the strata argument, you have to specify again the name of the dataframe
来源:https://stackoverflow.com/questions/29028727/bootstrap-by-groups-with-boot-package