I have a data set indexed by two ID variables (one nested in the other) and date, and I wish to calculate a rolling statistic in this data.
My real dataset is large
I'm sure there is a better way to do this, but one thing you can do is avoid the full cartesian join which is what's killing you by generating an interim table with the join keys:
dt.dates <- dt1[, list(date.join=seq(as.Date(date - 1, origin="1970-01-01"), by="-1 day", len=30)), by=list(date, id1, id2)]
For each date-id group, we've now generated the list of allowable join dates. Now we join back to the data and compute our metric.
setkey(dt.dates, date.join, id1, id2)
setkey(dt1,date,id1,id2)
dt.dates[dt1][ , sum(var1)/sum(var2), by=list(id1, id2, date)]
I couldn't replicate your result for 6/12, but I think we have a seeding issue. Compare:
> dt.date.join[dt1][ , sum(var1)/sum(var2), by=list(id1, id2, date)][date=="2012-06-12"]
id1 id2 date V1
1: a x 2012-06-12 3.630631
2: a y 2012-06-12 4.434783
3: b x 2012-06-12 3.634783
4: b y 2012-06-12 4.434783
> dt1[date < as.Date("2012-06-12") & date > as.Date("2012-06-12")-31, list("newstat"=sum(var1)/sum(var2), "date"=as.Date("2012-06-12")),by=list(id1,id2)]
id1 id2 newstat date
1: a x 3.630631 2012-06-12
2: a y 4.434783 2012-06-12
3: b x 3.634783 2012-06-12
4: b y 4.434783 2012-06-12
Basically the same result.