Stratified sample when some strata are too small

99封情书 提交于 2019-12-01 11:58:28

This doesn't answer your question about how to do this with the "sampling" package, but I've written a function called stratified that will do this for you.

If you have "devtools" installed, you can load it like this:

library(devtools)
source_gist(6424112)

Otherwise, just copy the code of the function from the Gist into your session and have fun.


Usage is simple:

set.seed(1) ## So you can reproduce this
stratified(DF, group = "geo_ID", size = 10)
# Some groups
# ---3, 4---
# contain fewer observations than desired number of samples.
# All observations have been returned from those groups.
#    geo_ID          V1        V2
# 7       1  1.51152200 2.3358481
# 9       1  2.01842371 2.9207286
# 14      1 -0.27878877 1.0464766
# 20      1  1.32011335 0.9002191
# 5       1  0.40426832 1.2727079
# :::SNIP:::
# 43      3  0.75816324 0.9967914
# 47      3 -0.81139318 1.5777441
# 55      3  0.08976065 0.3389009
# 51      3  0.32192527 1.9749074
# 48      4  1.44410126 1.8776498
# 44      4 -0.72670483 3.8484819
# 60      4  0.28488295 2.1372562
# 52      4 -0.78383894 2.1080727
# 56      4  0.27655075 1.6176663

There are some "fun" features, like subsetting your strata in the function itself:

## Selects only "geo_ID" values equal to 1 or 4
stratified(DF, group = "geo_ID", size = 10, select = list(geo_ID = c(1, 4)))

... taking a proportionate sample:

## Just set the size argument to a value less than 1
stratified(DF, group = "geo_ID", size = .1)

... and using multiple columns as your groups. The comments at the Gist include some examples to try out.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!