Stratified sample when some strata are too small

前端 未结 1 566
梦谈多话
梦谈多话 2021-01-15 21:24

I need to draw a stratified sample with n observation in each stratum, but some strata have fewer observations than n. If a stratum has too few obs

相关标签:
1条回答
  • 2021-01-15 22:31

    This doesn't answer your question about how to do this with the "sampling" package, but I've written a function called stratified that will do this for you.

    If you have "devtools" installed, you can load it like this:

    library(devtools)
    source_gist(6424112)
    

    Otherwise, just copy the code of the function from the Gist into your session and have fun.


    Usage is simple:

    set.seed(1) ## So you can reproduce this
    stratified(DF, group = "geo_ID", size = 10)
    # Some groups
    # ---3, 4---
    # contain fewer observations than desired number of samples.
    # All observations have been returned from those groups.
    #    geo_ID          V1        V2
    # 7       1  1.51152200 2.3358481
    # 9       1  2.01842371 2.9207286
    # 14      1 -0.27878877 1.0464766
    # 20      1  1.32011335 0.9002191
    # 5       1  0.40426832 1.2727079
    # :::SNIP:::
    # 43      3  0.75816324 0.9967914
    # 47      3 -0.81139318 1.5777441
    # 55      3  0.08976065 0.3389009
    # 51      3  0.32192527 1.9749074
    # 48      4  1.44410126 1.8776498
    # 44      4 -0.72670483 3.8484819
    # 60      4  0.28488295 2.1372562
    # 52      4 -0.78383894 2.1080727
    # 56      4  0.27655075 1.6176663
    

    There are some "fun" features, like subsetting your strata in the function itself:

    ## Selects only "geo_ID" values equal to 1 or 4
    stratified(DF, group = "geo_ID", size = 10, select = list(geo_ID = c(1, 4)))
    

    ... taking a proportionate sample:

    ## Just set the size argument to a value less than 1
    stratified(DF, group = "geo_ID", size = .1)
    

    ... and using multiple columns as your groups. The comments at the Gist include some examples to try out.

    0 讨论(0)
提交回复
热议问题