I'm really surprised no one has posted about apply, tapply, lapply, and sapply. A general rule I use when doing stuff in R is that if I have a for loop that is doing data processing or simulations, I try to factor it out and replace it with an *apply. Some people shy away from the *apply functions because they think only single parameter functions can be passed in. Nothing could be further from the truth! Like passing around functions with parameters as first class objects in Javascript, you do this in R with anonymous functions. For example:
> sapply(rnorm(100, 0, 1), round)
[1] 1 1 0 1 1 -1 -2 0 2 2 -2 -1 0 1 -1 0 1 -1 0 -1 0 0 0 0 0
[26] 2 0 -1 -2 0 0 1 -1 1 5 1 -1 0 1 1 1 2 0 -1 1 -1 1 0 -1 1
[51] 2 1 1 -2 -1 0 -1 2 -1 1 -1 1 -1 0 -1 -2 1 1 0 -1 -1 1 1 2 0
[76] 0 0 0 -2 -1 1 1 -2 1 -1 1 1 1 0 0 0 -1 -3 0 -1 0 0 0 1 1
> sapply(rnorm(100, 0, 1), round(x, 2)) # How can we pass a parameter?
Error in match.fun(FUN) : object 'x' not found
# Wrap your function call in an anonymous function to use parameters
> sapply(rnorm(100, 0, 1), function(x) {round(x, 2)})
[1] -0.05 -1.74 -0.09 -1.23 0.69 -1.43 0.76 0.55 0.96 -0.47 -0.81 -0.47
[13] 0.27 0.32 0.47 -1.28 -1.44 -1.93 0.51 -0.82 -0.06 -1.41 1.23 -0.26
[25] 0.22 -0.04 -2.17 0.60 -0.10 -0.92 0.13 2.62 1.03 -1.33 -1.73 -0.08
[37] 0.45 -0.93 0.40 0.05 1.09 -1.23 -0.35 0.62 0.01 -1.08 1.70 -1.27
[49] 0.55 0.60 -1.46 1.08 -1.88 -0.15 0.21 0.06 0.53 -1.16 -2.13 -0.03
[61] 0.33 -1.07 0.98 0.62 -0.01 -0.53 -1.17 -0.28 -0.95 0.71 -0.58 -0.03
[73] -1.47 -0.75 -0.54 0.42 -1.63 0.05 -1.90 0.40 -0.01 0.14 -1.58 1.37
[85] -1.00 -0.90 1.69 -0.11 -2.19 -0.74 1.34 -0.75 -0.51 -0.99 -0.36 -1.63
[97] -0.98 0.61 1.01 0.55
# Note that anonymous functions aren't being called, but being passed.
> function() {print('hello #rstats')}()
function() {print('hello #rstats')}()
> a = function() {print('hello #rstats')}
> a
function() {print('hello #rstats')}
> a()
[1] "hello #rstats"
(For those that follow #rstats, I also posted this there).
Remember, use apply, sapply, lapply, tapply, and do.call! Take avantage of R's vectorization. You should never walk up to a bunch of R code and see:
N = 10000
l = numeric()
for (i in seq(1:N)) {
sim <- rnorm(1, 0, 1)
l <- rbind(l, sim)
}
Not only is this not vectorized, but the array structure in R is not grown as it is in Python (doubling size when space runs out, IIRC). So each rbind step must first grow l enough to accept the results from rbind(), then copy all over the previous l's contents. For fun, try the above in R. Notice how long it takes (you won't even need Rprof or any timing function). Then try
N=10000
l <- rnorm(N, 0, 1)
The following is better than the first version too:
N = 10000
l = numeric(N)
for (i in seq(1:N)) {
sim <- rnorm(1, 0, 1)
l[i] <- sim
}