How to Calculate a rolling statistic in R using data.table on unevenly spaced data

前端 未结 1 1452
面向向阳花
面向向阳花 2021-01-13 11:06

I have a data set indexed by two ID variables (one nested in the other) and date, and I wish to calculate a rolling statistic in this data.

My real dataset is large

相关标签:
1条回答
  • 2021-01-13 11:50

    I'm sure there is a better way to do this, but one thing you can do is avoid the full cartesian join which is what's killing you by generating an interim table with the join keys:

    dt.dates <- dt1[, list(date.join=seq(as.Date(date - 1, origin="1970-01-01"), by="-1 day", len=30)), by=list(date, id1, id2)]
    

    For each date-id group, we've now generated the list of allowable join dates. Now we join back to the data and compute our metric.

    setkey(dt.dates, date.join, id1, id2)
    setkey(dt1,date,id1,id2)
    dt.dates[dt1][ , sum(var1)/sum(var2), by=list(id1, id2, date)]
    

    I couldn't replicate your result for 6/12, but I think we have a seeding issue. Compare:

    > dt.date.join[dt1][ , sum(var1)/sum(var2), by=list(id1, id2, date)][date=="2012-06-12"]
       id1 id2       date       V1
    1:   a   x 2012-06-12 3.630631
    2:   a   y 2012-06-12 4.434783
    3:   b   x 2012-06-12 3.634783
    4:   b   y 2012-06-12 4.434783
    > dt1[date < as.Date("2012-06-12") & date > as.Date("2012-06-12")-31, list("newstat"=sum(var1)/sum(var2), "date"=as.Date("2012-06-12")),by=list(id1,id2)]
       id1 id2  newstat       date
    1:   a   x 3.630631 2012-06-12
    2:   a   y 4.434783 2012-06-12
    3:   b   x 3.634783 2012-06-12
    4:   b   y 4.434783 2012-06-12
    

    Basically the same result.

    0 讨论(0)
提交回复
热议问题