Loops inefficiency in R

后端 未结 2 1085
暗喜
暗喜 2021-01-01 02:59

Good morning,

I have been developing for a few months in R and I have to make sure that the execution time of my code is not too long because I analyze big datasets.

相关标签:
2条回答
  • 2021-01-01 03:37

    Just a couple of comments. A for loop is roughly as fast as apply and its variants, and the real speed-ups come when you vectorise your function as much as possible (that is, using low-level loops, rather than apply, which just hides the for loop). I'm not sure if this is the best example, but consider the following:

    > n <- 1e06
    > sinI <- rep(NA,n)
    > system.time(for(i in 1:n) sinI[i] <- sin(i))
       user  system elapsed 
      3.316   0.000   3.358 
    > system.time(sinI <- sapply(1:n,sin))
       user  system elapsed 
      5.217   0.016   5.311 
    > system.time(sinI <- unlist(lapply(1:n,sin),
    +       recursive = FALSE, use.names = FALSE))
       user  system elapsed 
      1.284   0.012   1.303 
    > system.time(sinI <- sin(1:n))
       user  system elapsed 
      0.056   0.000   0.057 
    

    In one of the comments below, Marek points out that the time consuming part of the for loop above is actually the ]<- part:

    > system.time(sinI <- unlist(lapply(1:n,sin),
    +       recursive = FALSE, use.names = FALSE))
       user  system elapsed 
      1.284   0.012   1.303 
    

    The bottlenecks which can't immediately be vectorised can be rewritten in C or Fortran, compiled with R CMD SHLIB, and then plugged in with .Call, .C or .Fortran.

    Also, see these links for more info about loop optimisation in R. Also check out the article "How Can I Avoid This Loop or Make It Faster?" in R News.

    0 讨论(0)
  • 2021-01-01 03:48

    vapply avoids the post-processing by requiring that you specify what the return value is. It turns out to be 3.4 times faster than the for-loop:

    > system.time(for(i in 1:n) sinI[i] <- sin(i))
       user  system elapsed 
       2.41    0.00    2.39 
    
    > system.time(sinI <- unlist(lapply(1:n,sin), recursive = FALSE, use.names = FALSE))
       user  system elapsed 
       1.46    0.00    1.45 
    
    > system.time(sinI <- vapply(1:n,sin, numeric(1)))
       user  system elapsed 
       0.71    0.00    0.69 
    
    0 讨论(0)
提交回复
热议问题