Why not use a for loop?

后端 未结 2 600
野趣味
野趣味 2021-01-18 01:13

I\'ve been seeing a lot of comments among data scientists online about how for loops are not advisable. However, I recently found myself in a situation where using one was h

2条回答
  •  执念已碎
    2021-01-18 01:58

    You could write it this way, it's more compact:

    outANOVA <-
      lapply(dat,function(y)
        summary(aov(y ~ factor(time) + Error(factor(code)),data = dat)))
    

    for loops are not necessarily slower than apply functions but they're less easy to read for many people. It is to some extent a matter of taste.

    The real crime is to use a for loop when a vectorized function is available. These vectorized functions usually contain for loops written in C that are much faster (or call functions that do).

    Notice that in this case we also could avoid to create a global variable y and that we didn't have to initialize the list outANOVA.

    Another point, directly from this relevant post :For loops in R and computational speed (answer by Glen_b):

    For loops in R are not always slower than other approaches, like apply - but there's one huge bugbear - •never grow an array inside a loop

    Instead, make your arrays full-size before you loop and then fill them up.

    In your case you're growing outANOVA, for big loops it could become problematic.

    Here is some microbenchmark of different methods on a simple example:

    n <- 100000
    microbenchmark::microbenchmark(
    preallocated_vec  = {x <- vector(length=n); for(i in 1:n) {x[i] <- i^2}},
    preallocated_vec2 = {x <- numeric(n); for(i in 1:n) {x[i] <- i^2}},
    incremented_vec   = {x <- vector(); for(i in 1:n) {x[i] <- i^2}},
    preallocated_list = {x <- vector(mode = "list", length = n); for(i in 1:n) {x[i] <- i^2}},
    incremented_list  = {x <- list(); for(i in 1:n) {x[i] <- i^2}},
    sapply            = sapply(1:n, function(i) i^2),
    lapply            = lapply(1:n, function(i) i^2),
    times=20)
    
    # Unit: milliseconds
    # expr                     min         lq       mean     median         uq        max neval
    # preallocated_vec    9.784237  10.100880  10.686141  10.367717  10.755598  12.839584    20
    # preallocated_vec2   9.953877  10.315044  10.979043  10.514266  11.792158  12.789175    20
    # incremented_vec    74.511906  79.318298  81.277439  81.640597  83.344403  85.982590    20
    # preallocated_list  10.680134  11.197962  12.382082  11.416352  13.528562  18.620355    20
    # incremented_list  196.759920 201.418857 212.716685 203.485940 205.441188 393.522857    20
    # sapply              6.557739   6.729191   7.244242   7.063643   7.186044   9.098730    20
    # lapply              6.019838   6.298750   6.835941   6.571775   6.844650   8.812273    20
    

提交回复
热议问题