Parallelize a rolling window regression in R

前端 未结 2 1050
眼角桃花
眼角桃花 2021-02-02 01:33

I\'m running a rolling regression very similar to the following code:

library(PerformanceAnalytics)
library(quantmod)
data(managers)

FL <- as.formula(Next(HA         


        
2条回答
  •  时光取名叫无心
    2021-02-02 02:09

    The obvious one is to use lm.fit() instead of lm() so you don't incur all the overhead in processing the formula etc.

    Update: So when I said obvious what I meant to say was blindingly obvious but deceptively difficult to implement!

    After a bit of fiddling around, I came up with this

    library(PerformanceAnalytics)
    library(quantmod)
    data(managers)
    

    The first stage is to realise that the model matrix can be prebuilt, so we do that and convert it back to a Zoo object for use with rollapply():

    mmat2 <- model.frame(Next(HAM1) ~ HAM1 + HAM2 + HAM3 + HAM4, data = managers, 
                         na.action = na.pass)
    mmat2 <- cbind.data.frame(mmat2[,1], Intercept = 1, mmat2[,-1])
    mmatZ <- as.zoo(mmat2)
    

    Now we need a function that will employ lm.fit() to do the heavy lifting without having to create design matrices at each iteration:

    MyRegression2 <- function(Z) {
        ## store value we want to predict for
        pred <- Z[31, -1, drop = FALSE]
        ## get rid of any rows with NA in training data
        Z <- Z[1:30, ][!rowSums(is.na(Z[1:30,])) > 0, ]
        ## Next() would lag and leave NA in row 30 for response
        ## but we precomputed model matrix, so drop last row still in Z
        Z <- Z[-nrow(Z),]
        ## fit the model
        fit <- lm.fit(Z[, -1, drop = FALSE], Z[,1])
        ## get things we need to predict, in case pivoting turned on in lm.fit
        p <- fit$rank
        p1 <- seq_len(p)
        piv <- fit$qr$pivot[p1]
        ## model coefficients
        beta <- fit$coefficients
        ## this gives the predicted value for row 31 of data passed in
        drop(pred[, piv, drop = FALSE] %*% beta[piv])
    }
    

    A comparison of timings:

    > system.time(Result <- rollapply(managers, 31, FUN="MyRegression",FL,
    +                                 by.column = FALSE, align = "right", 
    +                                 na.pad = TRUE))
       user  system elapsed 
      0.925   0.002   1.020 
    > 
    > system.time(Result2 <- rollapply(mmatZ, 31, FUN = MyRegression2,
    +                                  by.column = FALSE,  align = "right",
    +                                  na.pad = TRUE))
       user  system elapsed 
      0.048   0.000   0.05
    

    Which affords a pretty reasonable improvement over the original. And now check that the resulting objects are the same:

    > all.equal(Result, Result2)
    [1] TRUE
    

    Enjoy!

提交回复
热议问题