I have two data frames (DF1 and DF2):
(1) DF1 contains information on individual-level, i.e. on 10.000 individuals nested in 30 units across 11 years (2000
You were almost there, I just modified slightly your for loop, and also made sure that the date variables are considered as such:
DF1$date1 = as.Date(DF1$date1,"%Y-%m-%d")
DF1$date2 = as.Date(DF1$date2,"%Y-%m-%d")
DF2$date = as.Date(DF2$date,"%Y-%m-%d")
for(i in 1:nrow(DF1)){
DF1$newvar[i] <-sum(DF2$x[which(DF2$unit == DF1$unit[i] &
DF2$date>= DF1$date1[i] &
DF2$date<= DF1$date2[i])])
}
The problem was, that you were asking DF2$date
to be simultaneously ==
DF1$date1
& DF1$date2
.
And also, length(DF1)
gives you the number of columns. To have the number of rows you can either use nrow(DF1)
, or dim(DF1)[1]
.
We can use data.table
library(data.table)
setDT(DF1)
setDT(DF2)
DF1[DF2[, .(newvar = sum(x)), .(unit, individual = cumsum(date %in% DF1$date1))],
newvar := newvar, on = .(individual, unit)]
DF1
# individual unit date1 date2 newvar
#1: 1 1 2000-01-01 2001-01-01 6
#2: 2 1 2001-01-02 2002-01-02 60
Or we can use a non-equi join
DF1[DF2[DF1, sum(x), on = .(unit, date >= date1, date <= date2),
by = .EACHI], newvar := V1, on = .(unit, date1=date)]
DF1
# individual unit date1 date2 newvar
#1: 1 1 2000-01-01 2001-01-01 6
#2: 2 1 2001-01-02 2002-01-02 60