I use mclapply for all my \"embarassingly parallel\" computations. I find it clean and easy to use, and when arguments mc.cores = 1
and mc.preschedule = TRUE
The problem is almost the same as described here: Understanding the differences between mclapply and parLapply in R .
The mclapply
is creating clones of the master process for each worker processes (threads/cores) at the point that mclapply
is called, reproducibility is guaranteed. Unfortunately, that isn't possible on Windows where in contrast to multicore there is always used the multisession parallelism by foreach
or parLapply
.
When using parLapply
or foreach
with %dopar%
, you generally have to perform the following additional steps: Create a PSOCK cluster, Register the cluster if desired, Load necessary packages on the cluster workers, Export necessary data and functions to the global environment of the cluster workers.
That is why foreach
has parameters like .packages
and .export
which enable us to distribute everything needed across sessions.
future
package provided details of differences between mulicore and multisession processing https://cran.r-project.org/web/packages/future/vignettes/future-1-overview.html
As Steve Weston (author of foreach
) says here, using foreach
with doParallel
as backend you can initialize workers. This can be helpful for setting up a database connection more efficiently once per worker instead of once per task.