Good morning,
I have been developing for a few months in R and I have to make sure that the execution time of my code is not too long because I analyze big datasets.
Just a couple of comments. A for
loop is roughly as fast as apply
and its variants, and the real speed-ups come when you vectorise your function as much as possible (that is, using low-level loops, rather than apply
, which just hides the for
loop). I'm not sure if this is the best example, but consider the following:
> n <- 1e06
> sinI <- rep(NA,n)
> system.time(for(i in 1:n) sinI[i] <- sin(i))
user system elapsed
3.316 0.000 3.358
> system.time(sinI <- sapply(1:n,sin))
user system elapsed
5.217 0.016 5.311
> system.time(sinI <- unlist(lapply(1:n,sin),
+ recursive = FALSE, use.names = FALSE))
user system elapsed
1.284 0.012 1.303
> system.time(sinI <- sin(1:n))
user system elapsed
0.056 0.000 0.057
In one of the comments below, Marek points out that the time consuming part of the for
loop above is actually the ]<-
part:
> system.time(sinI <- unlist(lapply(1:n,sin),
+ recursive = FALSE, use.names = FALSE))
user system elapsed
1.284 0.012 1.303
The bottlenecks which can't immediately be vectorised can be rewritten in C or Fortran, compiled with R CMD SHLIB
, and then plugged in with .Call
, .C
or .Fortran
.
Also, see these links for more info about loop optimisation in R. Also check out the article "How Can I Avoid This Loop or Make It Faster?" in R News.
vapply avoids the post-processing by requiring that you specify what the return value is. It turns out to be 3.4 times faster than the for-loop:
> system.time(for(i in 1:n) sinI[i] <- sin(i))
user system elapsed
2.41 0.00 2.39
> system.time(sinI <- unlist(lapply(1:n,sin), recursive = FALSE, use.names = FALSE))
user system elapsed
1.46 0.00 1.45
> system.time(sinI <- vapply(1:n,sin, numeric(1)))
user system elapsed
0.71 0.00 0.69