In R: how to sum a variable by group between two dates

前端 未结 2 1860
时光取名叫无心
时光取名叫无心 2021-01-16 04:58

I have two data frames (DF1 and DF2):

(1) DF1 contains information on individual-level, i.e. on 10.000 individuals nested in 30 units across 11 years (2000

相关标签:
2条回答
  • 2021-01-16 05:43

    You were almost there, I just modified slightly your for loop, and also made sure that the date variables are considered as such:

    DF1$date1 = as.Date(DF1$date1,"%Y-%m-%d")
    DF1$date2 = as.Date(DF1$date2,"%Y-%m-%d")
    DF2$date = as.Date(DF2$date,"%Y-%m-%d")
    
    for(i in 1:nrow(DF1)){
      DF1$newvar[i] <-sum(DF2$x[which(DF2$unit == DF1$unit[i] & 
                                      DF2$date>= DF1$date1[i] &
                                      DF2$date<= DF1$date2[i])]) 
    }
    

    The problem was, that you were asking DF2$date to be simultaneously == DF1$date1 & DF1$date2. And also, length(DF1) gives you the number of columns. To have the number of rows you can either use nrow(DF1), or dim(DF1)[1].

    0 讨论(0)
  • 2021-01-16 05:59

    We can use data.table

    library(data.table)
    setDT(DF1)
    setDT(DF2)
    DF1[DF2[, .(newvar = sum(x)), .(unit, individual = cumsum(date %in% DF1$date1))],
                 newvar := newvar, on = .(individual, unit)]
    DF1
    #    individual unit      date1      date2 newvar
    #1:          1    1 2000-01-01 2001-01-01      6
    #2:          2    1 2001-01-02 2002-01-02     60
    

    Or we can use a non-equi join

    DF1[DF2[DF1, sum(x), on = .(unit, date >= date1, date <= date2),
            by = .EACHI], newvar := V1, on = .(unit, date1=date)]
    
    DF1
    #   individual unit      date1      date2 newvar
    #1:          1    1 2000-01-01 2001-01-01      6
    #2:          2    1 2001-01-02 2002-01-02     60
    
    0 讨论(0)
提交回复
热议问题