how to speed up an R for loop?

前端 未结 2 864
醉酒成梦
醉酒成梦 2021-02-10 14:34

I am running the following for loop for the gwr.basic function in the GWmodel package in R. What I need to do is to collect the mean of estimate parameter for any given bandwidt

相关标签:
2条回答
  • 2021-02-10 14:48

    I got the same impression like @musically_ut. The for loop and the traditional for-vs.apply debate is unlikely to help you here. Try to go for parallelization if you got more than one core. There are several packages like parallel or snowfall. Which package is ultimately the best and fastest depends on your machine and operating system.

    Best does not always equal fastest here. A code that works cross-platform and can be worth more than a bit of extra performance. Also transparency and ease of use can outweigh maximum speed. That being said I like the standard solution a lot and would recommend to use parallel which ships with R and works on Windows, OSX and Linux.

    EDIT: here's the fully reproducible example using the OP's example.

    library(GWmodel)
    data("DubVoter")
    
    library(parallel)
    
    bwlist <- list(bw1 = 20, bw2 = 21)
    
    
    cl <- makeCluster(detectCores())
    
    # load 'GWmodel' for each node
    clusterEvalQ(cl, library(GWmodel))
    
    # export data to each node
    clusterExport(cl, varlist = c("bwlist","Dub.voter"))
    
    out <- parLapply(cl, bwlist, function(e){
     try(gwr.basic(GenEl2004 ~ DiffAdd + LARent + SC1 +
     Unempl + LowEduc + Age18_24 + Age25_44 +
     Age45_64, data = Dub.voter,
     bw = e,  kernel = "bisquare",
     adaptive = TRUE, F123.test = TRUE  ))
    
    } )
    
    
    LArent_l <- lapply(lapply(out,"[[","SDF"),"[[","LARent")
    unlist(lapply(LArent_l,"mean"))
    
    # finally, stop the cluster
    stopCluster(cl)
    
    0 讨论(0)
  • 2021-02-10 14:55

    Besides using parallelization as Matt Bannert suggests, you should preallocate the vector LARentMean. Often, it's not the for loop itself that is slow but the fact that the for seduces you to do slow things like creating growing vectors.

    Consider the following example to see the impact of a growing vector as compared to preallocating the memory:

    library(microbenchmark)
    
    growing <- function(x) {
      mylist <- list()
      for (i in 1:x) {
        mylist[[i]] <- i
      }
    }
    
    allocate <- function(x) {
      mylist <- vector(mode = "list", length = x)
      for (i in 1:x) {
        mylist[[i]] <- i
      }
    }
    
    microbenchmark(growing(1000), allocate(1000), times = 1000)
    # Unit: microseconds
    #          expr      min       lq      mean   median       uq       max neval
    # growing(1000) 3055.134 4284.202 4743.4874 4433.024 4655.616 47977.236  1000
    # allocate(1000)  867.703  917.738  998.0719  956.441  995.143  2564.192  1000
    

    The growing list is about 5 times slower than the version that preallocates the memory.

    0 讨论(0)
提交回复
热议问题