Speed up the loop operation in R

前端 未结 10 2093
说谎
说谎 2020-11-22 00:04

I have a big performance problem in R. I wrote a function that iterates over a data.frame object. It simply adds a new column to a data.frame and a

10条回答
  •  [愿得一人]
    2020-11-22 00:23

    Processing with data.table is a viable option:

    n <- 1000000
    df <- as.data.frame(matrix(sample(1:10, n*9, TRUE), n, 9))
    colnames(df) <- paste("col", 1:9, sep = "")
    
    library(data.table)
    
    dayloop2.dt <- function(df) {
      dt <- data.table(df)
      dt[, Kumm. := {
        res <- .I;
        ifelse (res > 1,             
          ifelse ((col6 == shift(col6, fill = 0)) & (col3 == shift(col3, fill = 0)) , 
            res <- col9 + shift(res)                   
          , # else
            res <- col9                                 
          )
         , # else
          res <- col9
        )
      }
      ,]
      res <- data.frame(dt)
      return (res)
    }
    
    res <- dayloop2.dt(df)
    
    m <- microbenchmark(dayloop2.dt(df), times = 100)
    #Unit: milliseconds
    #       expr      min        lq     mean   median       uq      max neval
    #dayloop2.dt(df) 436.4467 441.02076 578.7126 503.9874 575.9534 966.1042    10
    

    If you ignore the possible gains from conditions filtering, it is very fast. Obviously, if you can do the calculation on the subset of data, it helps.

提交回复
热议问题