Is there a _fast_ way to run a rolling regression inside data.table?

后端 未结 2 901
[愿得一人]
[愿得一人] 2020-11-30 08:07

I am running rolling regressions in R, using with the data stored in a data.table.

I have a working version, however it feels like a hack -- I

相关标签:
2条回答
  • 2020-11-30 08:32

    You can do 14585 / 766 ~ 19 times faster with the roll_regres function from the rollRegres package

    require(zoo)
    require(data.table)
    require(microbenchmark)
    set.seed(1)
    
    tt <- seq(as.Date("2011-01-01"), as.Date("2012-01-01"), by="day")
    px <- rnorm(366, 95, 1)
    
    DT <- data.table(period=tt, pvec=px)
    
    dtt <- DT[,tnum:=as.numeric(period)][, list(pvec, tnum)]
    
    # this is a quite bad problem as tnum and the square has a high cor
    cor(dtt$tnum, dtt$tnum^2)
    #R [1] 0.9999951
    
    # so we center it to avoid numerical issues in the comparisons
    dtt$tnum <- dtt$tnum - mean(dtt$tnum)
    cor(dtt$tnum, dtt$tnum^2)
    #R [1] -2.355659e-21
    
    dtx <- as.matrix(DT[,tnum:=as.numeric(period)][, tnum2:= tnum^2][, int:=1][, list(pvec, int, tnum, tnum2)])
    
    rollreg <- function(dd)
      coef(lm(pvec ~ tnum + I(tnum^2), data = as.data.frame(dd)))
    rollreg.fit <- function(dd) coef(lm.fit(y=dd[,1], x=dd[,-1]))
    
    rr <- function(dd) rollapplyr(
      dd, width=20, FUN = rollreg, by.column = FALSE, align = "right")
    rr.fit <- function(dd) rollapplyr(
      dd, width=20, FUN = rollreg.fit, by.column = FALSE, align = "right")
    
    #####
    # use rollRegres
    library(rollRegres)
    rollreg_out    <- rr(dtt)
    rollRegres_out <- roll_regres(pvec ~ tnum + I(tnum^2), dtt, width = 20L)
    
    # show that they give the same
    all.equal(rollRegres_out$coefs[-(1:19), ], rollreg_out,
              check.attributes = FALSE)
    #R [1] "Mean relative difference: 4.985435e-08"
    
    #####
    # benchmark
    microbenchmark(
      rr = rr(dtt),
      rr.fit = rr.fit(dtx),
      roll_regres = roll_regres(pvec ~ tnum + I(tnum^2), dtt ,width = 20L),
      times = 5)
    #R Unit: microseconds
    #R expr        min         lq        mean     median         uq       max neval
    #R          rr 279404.357 279456.901 282071.3414 279989.840 282201.396 289304.21     5
    #R      rr.fit  13744.598  14017.981  14585.2106  14147.166  14887.117  16129.19     5
    #R roll_regres    621.037    660.939    766.7364    721.383    843.853    986.47     5
    
    0 讨论(0)
  • 2020-11-30 08:43

    Not as far as I know; data.table doesn't have any special features for rolling windows. Other packages already implement rolling functionality on vectors, so they can be used in the j of data.table. If they are not efficient enough, and no package has faster versions (?), then it's a case of writing faster versions yourself and (of course) contributing them: either to an existing package or creating your own.

    Related questions (follow links in links) :

    Using data.table to speed up rollapply
    R data.table sliding window
    Rolling regression over multiple columns in R

    0 讨论(0)
提交回复
热议问题