I often end up with several nested foreach
loops and sometimes when writing general functions (e.g. for a package) there is no level which is obvious to parallelize
If you end up with several nested foreach loops, I'd rethink my approach. Using parallel versions of tapply
can solve a lot of that hassle. In general, you shouldn't use nested parallelization, as that doesn't bring you anything. Parallelize the outer loop, and forget about the inner loop.
Reason is simple : if you have 3 connections in your cluster, the outer dopar loop will use all three. The inner dopar loop will not be able to use any extra connections, as there are none available. So you don't gain a thing. Hence, the mock-up you give doesn't make sense at all from a programming point of view.
Your second question is answered pretty easily by the function getDoParRegistered()
which returns TRUE when a backend is registered, and FALSE otherwise. Pay attention though :
eg :
require(foreach)
require(doSNOW)
cl <- makeCluster(rep("localhost",2),type="SOCK")
getDoParRegistered()
[1] FALSE
registerDoSNOW(cl)
getDoParRegistered()
[1] TRUE
stopCluster(cl)
getDoParRegistered()
[1] TRUE
But now running this code :
a <- matrix(1:16, 4, 4)
b <- t(a)
foreach(b=iter(b, by='col'), .combine=cbind) %dopar%
(a %*% b)
will return in an error :
Error in summary.connection(connection) : invalid connection
You could build an extra check. A (hideously ugly) hack you can use to check that the connection registered by doSNOW
is valid, can be :
isvalid <- function(){
if (getDoParRegistered() ){
X <- foreach:::.foreachGlobals$objs[[1]]$data
x <- try(capture.output(print(X)),silent=TRUE)
if(is(x,"try-error")) FALSE else TRUE
} else {
FALSE
}
}
Which you could use as
if(!isvalid()) registerDoSEQ()
This will register the sequential backend if getDoParRegistered() returns TRUE but there is no valid cluster connection any longer. But again, this is a hack, and I have no idea if it works with other backends or even other types of cluster types (I use sockets mostly).