I often end up with several nested foreach
loops and sometimes when writing general functions (e.g. for a package) there is no level which is obvious to parallelize
The issue that you raise was the motivation for the foreach nesting operator, '%:%'. If the body of the inner loop takes a substantial amount of compute time, you're pretty safe using:
foreach(i = 1:I) %:%
foreach(j = 1:J) %dopar% {
# Do stuff
}
This "unrolls" the nested loops, resulting in (I * J) tasks that can all be executed in parallel.
If the body of the inner loop doesn't take much time, the solution is more difficult. The standard solution is to parallelize the outer loop, but that could still result in either many small tasks (when I is large and J is small) or a few large tasks (when I is small and J is large).
My favorite solution is to use the nesting operator with task chunking. Here's a complete example using the doMPI backend:
library(doMPI)
cl <- startMPIcluster()
registerDoMPI(cl)
I <- 100; J <- 2
opt <- list(chunkSize=10)
foreach(i = 1:I, .combine='cbind', .options.mpi=opt) %:%
foreach(j = 1:J, .combine='c') %dopar% {
(i * j)
}
closeCluster(cl)
This results in 20 "task chunks", each consisting of 10 computations of the loop body. If you want to have a single task chunk for each worker, you can compute the chunk size as:
cs <- ceiling((I * J) / getDoParWorkers())
opt <- list(chunkSize=cs)
Unfortunately, not all parallel backends support task chunking. Also, doMPI doesn't support Windows.
For more information on this topic, see my vignette "Nesting Foreach Loops" in the foreach package:
library(foreach)
vignette('nesting')