How to Calculate a rolling statistic in R using data.table on unevenly spaced data

前端未结

关注

 1  1452

I have a data set indexed by two ID variables (one nested in the other) and date, and I wish to calculate a rolling statistic in this data.

My real dataset is large

相关标签:

1条回答

南旧

2021-01-13 11:50

I'm sure there is a better way to do this, but one thing you can do is avoid the full cartesian join which is what's killing you by generating an interim table with the join keys:

dt.dates <- dt1[, list(date.join=seq(as.Date(date - 1, origin="1970-01-01"), by="-1 day", len=30)), by=list(date, id1, id2)]

For each date-id group, we've now generated the list of allowable join dates. Now we join back to the data and compute our metric.

setkey(dt.dates, date.join, id1, id2)
setkey(dt1,date,id1,id2)
dt.dates[dt1][ , sum(var1)/sum(var2), by=list(id1, id2, date)]

I couldn't replicate your result for 6/12, but I think we have a seeding issue. Compare:

> dt.date.join[dt1][ , sum(var1)/sum(var2), by=list(id1, id2, date)][date=="2012-06-12"]
   id1 id2       date       V1
1:   a   x 2012-06-12 3.630631
2:   a   y 2012-06-12 4.434783
3:   b   x 2012-06-12 3.634783
4:   b   y 2012-06-12 4.434783
> dt1[date < as.Date("2012-06-12") & date > as.Date("2012-06-12")-31, list("newstat"=sum(var1)/sum(var2), "date"=as.Date("2012-06-12")),by=list(id1,id2)]
   id1 id2  newstat       date
1:   a   x 3.630631 2012-06-12
2:   a   y 4.434783 2012-06-12
3:   b   x 3.634783 2012-06-12
4:   b   y 4.434783 2012-06-12

Basically the same result.

0 讨论(0)