Stratified sampling - not enough observations

后端 未结 1 458
-上瘾入骨i
-上瘾入骨i 2021-01-15 01:17

What I would like to achieve is get a 10% sample from each group (which is a combination of 2 factors - recency and frequency category). So far I have thought about package

相关标签:
1条回答
  • 2021-01-15 02:16

    You could always do it yourself:

    stratified <- NULL
    for(x in 1:6) {
      tmp1 <- sample(rownames(subset(d, r_cat == "A" & f_cat == LETTERS[x])),round(nrow(d[r_cat == "A")*0.1))
      tmp2 <- sample(rownames(subset(d, r_cat == "B" & f_cat == LETTERS[x])),round(nrow(d[r_cat == "B")*0.1))
      tmp3 <- sample(rownames(subset(d, r_cat == "C" & f_cat == LETTERS[x])),round(nrow(d[r_cat == "C")*0.1))
      tmp4 <- sample(rownames(subset(d, r_cat == "D" & f_cat == LETTERS[x])),round(nrow(d[r_cat == "D")*0.1))
      tmp5 <- sample(rownames(subset(d, r_cat == "E" & f_cat == LETTERS[x])),round(nrow(d[r_cat == "E")*0.1))
      tmp6 <- sample(rownames(subset(d, r_cat == "F" & f_cat == LETTERS[x])),round(nrow(d[r_cat == "F")*0.1))
      tmp7 <- sample(rownames(subset(d, r_cat == "G" & f_cat == LETTERS[x])),round(nrow(d[r_cat == "G")*0.1))
      stratified <- c(stratified,tmp1,tmp2,tmp3,tmp4,tmp5,tmp6,tmp7)
    }
    

    And then...

    d[stratified,] would be your stratified sample.

    0 讨论(0)
提交回复
热议问题