Should mclapply calls be nested?

时光毁灭记忆、已成空白 提交于 2021-01-28 06:40:28

问题


Is nesting parallel::mclapply calls a good idea?

require(parallel)
ans <- mclapply(1:3, function(x) mclapply(1:3, function(y) y * x))
unlist(ans)

Outputs:

[1] 1 2 3 2 4 6 3 6 9

So it's "working". But is it recommended for real compute-intensive tasks that outnumber the number of cores? what is going on when this is executed? Are the multiple forks involved more potentially wasteful? What are the considerations for mc.cores and mc.preschedule?

Edit Just to clarify the motivation, often it seems natural to parallelize by splitting one dimension (e.g., use different cores to handle data from n different years), then within this split comes another natural way to split (e.g., use different cores to calculate each one of m different functions). When m times n is smaller than the total number of available cores the above nesting looks sensible, at least on the face of it.


回答1:


In the following experiment, the parallel execution of the test function testfn() was faster compared to the nested parallel execution:

library(parallel)
library(microbenchmark)
testfn <- function(x) rnorm(10000000)

microbenchmark('parallel'= o <- mclapply(1:8, testfn, mc.cores=4),
               'nested'  = o <- mclapply(1:2, function(x) mclapply(1:4, testfn, mc.cores=2), 
                                         mc.cores=2),
               times=10)
Unit: seconds
     expr      min       lq     mean   median       uq      max neval
 parallel 3.727131 3.756445 3.802470 3.815977 3.834144 3.890128    10
   nested 4.355846 4.372996 4.508291 4.453881 4.578837 4.863664    10

Explanation:
The communication between the R session and four R workers seems to be more efficient than the communication between the R session and two workers which in turn fork and communicate to two other workers each.

Alternative:
The package foreach can handle nested loops, which is close to nested mclapply() calls; see the vignette https://cran.r-project.org/web/packages/foreach/vignettes/nested.pdf.

(The optimal setting of the argument mc.preschedule depends on the specific problem; see the help page ?mclapply.)



来源:https://stackoverflow.com/questions/51707443/should-mclapply-calls-be-nested

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!