Problems using foreach parallelization

后端 未结 2 1453
花落未央
花落未央 2020-12-13 16:21

I\'m trying to compare parallelization options. Specifically, I\'m comparing the standard SNOW and mulitcore implementations to those using d

2条回答
  •  醉梦人生
    2020-12-13 16:40

    To follow on something Joris said, foreach() is best when the number of jobs does not hugely exceed the number of processors you will be using. Or more generally, when each job takes a significant amount of time on its own (seconds or minutes, say). There is a lot of overhead in creating the threads, so you really don't want to use it for lots of small jobs. If you were doing 10 million sims rather than 10 thousand, and you structured your code like this:

    nSims = 1e7
    nBatch = 1e6
    foreach(i=1:(nSims/nBatch), .combine=c) %dopar% {
      replicate(nBatch, mean(rnorm(n=size, mean=mu, sd=sigma))
    }
    

    I bet you would find that foreach was doing pretty well.

    Also note the use of replicate() for this kind of application rather than sapply. Actually, the foreach package has a similar convenience function, times(), which could be applied in this case. Of course, if your code is not doing a simple simulations with identical parameters every time, you will need sapply() and foreach().

提交回复
热议问题