In R: how to sum a variable by group between two dates

前端 未结 2 1861
时光取名叫无心
时光取名叫无心 2021-01-16 04:58

I have two data frames (DF1 and DF2):

(1) DF1 contains information on individual-level, i.e. on 10.000 individuals nested in 30 units across 11 years (2000

2条回答
  •  夕颜
    夕颜 (楼主)
    2021-01-16 05:59

    We can use data.table

    library(data.table)
    setDT(DF1)
    setDT(DF2)
    DF1[DF2[, .(newvar = sum(x)), .(unit, individual = cumsum(date %in% DF1$date1))],
                 newvar := newvar, on = .(individual, unit)]
    DF1
    #    individual unit      date1      date2 newvar
    #1:          1    1 2000-01-01 2001-01-01      6
    #2:          2    1 2001-01-02 2002-01-02     60
    

    Or we can use a non-equi join

    DF1[DF2[DF1, sum(x), on = .(unit, date >= date1, date <= date2),
            by = .EACHI], newvar := V1, on = .(unit, date1=date)]
    
    DF1
    #   individual unit      date1      date2 newvar
    #1:          1    1 2000-01-01 2001-01-01      6
    #2:          2    1 2001-01-02 2002-01-02     60
    

提交回复
热议问题