I would like to efficiently make a random sample by group from a data.table
, but it should be possible to sample a different proportion for each group.
If I
Here's an option which uses a lookup table (and so doesn't rely on the ordering of vectors or groups).
library(data.table)
DT = data.table(group = sample(1:2), val = sample(1:1000,20))
sample_props <- data.table(group = 1:2, prop = c(.1,.5))
group_sampler <- function(data, group_col, sample_props){
# this function samples sample_fraction <0,1> from each group in the data.table
# inputs:
# data - data.table with data
# group_col - column(s) used to group by (must be in both data.tables)
# sample_props - data.table with sample proportions
ret <- merge(DT, sample_props, by = group_col)
ret <- ret[,.SD[sample(.N, ceiling(.N*prop))], eval(group_col)]
return(ret[,prop := NULL][])
}
# perform the sampling
group_sampler(DT, 'group', sample_props)
#> group val
#> 1: 1 721
#> 2: 2 542
#> 3: 2 680
#> 4: 2 613
#> 5: 2 170
#> 6: 2 175
Created on 2019-10-15 by the reprex package (v0.3.0)