R data.table - sample by group with different sampling proportion

后端 未结 2 901
无人及你
无人及你 2021-01-27 21:18

I would like to efficiently make a random sample by group from a data.table, but it should be possible to sample a different proportion for each group.

If I

2条回答
  •  无人共我
    2021-01-27 22:07

    Here's an option which uses a lookup table (and so doesn't rely on the ordering of vectors or groups).

    library(data.table)
    DT = data.table(group = sample(1:2), val = sample(1:1000,20))
    
    sample_props <- data.table(group = 1:2, prop = c(.1,.5))
    
    group_sampler <- function(data, group_col, sample_props){
      # this function samples sample_fraction <0,1> from each group in the data.table
      # inputs:
      #   data - data.table with data
      #   group_col - column(s) used to group by (must be in both data.tables)
      #   sample_props - data.table with sample proportions
      ret <- merge(DT, sample_props, by = group_col)
      ret <- ret[,.SD[sample(.N, ceiling(.N*prop))], eval(group_col)]
      return(ret[,prop := NULL][])
    }
    
    # perform the sampling
    group_sampler(DT, 'group', sample_props)
    #>    group val
    #> 1:     1 721
    #> 2:     2 542
    #> 3:     2 680
    #> 4:     2 613
    #> 5:     2 170
    #> 6:     2 175
    

    Created on 2019-10-15 by the reprex package (v0.3.0)

提交回复
热议问题