R mclapply vs foreach

前端 未结 2 1696
鱼传尺愫
鱼传尺愫 2021-02-07 06:27

I use mclapply for all my \"embarassingly parallel\" computations. I find it clean and easy to use, and when arguments mc.cores = 1 and mc.preschedule = TRUE

相关标签:
2条回答
  • 2021-02-07 06:40

    The problem is almost the same as described here: Understanding the differences between mclapply and parLapply in R .

    The mclapply is creating clones of the master process for each worker processes (threads/cores) at the point that mclapply is called, reproducibility is guaranteed. Unfortunately, that isn't possible on Windows where in contrast to multicore there is always used the multisession parallelism by foreach or parLapply.

    When using parLapply or foreach with %dopar%, you generally have to perform the following additional steps: Create a PSOCK cluster, Register the cluster if desired, Load necessary packages on the cluster workers, Export necessary data and functions to the global environment of the cluster workers.

    That is why foreach has parameters like .packages and .export which enable us to distribute everything needed across sessions.

    future package provided details of differences between mulicore and multisession processing https://cran.r-project.org/web/packages/future/vignettes/future-1-overview.html

    0 讨论(0)
  • 2021-02-07 06:52

    As Steve Weston (author of foreach) says here, using foreach with doParallel as backend you can initialize workers. This can be helpful for setting up a database connection more efficiently once per worker instead of once per task.

    0 讨论(0)
提交回复
热议问题