Is there a fast way to iterate through combinations like those returned by expand.grid
or CJ
(data.table
). These get too big to fit i
I think you'll get better performance if you give each of the workers a chunk of one of the data frames, have them each perform the computations, and then combine the results. This results in more efficient computation and reduced memory usage by the workers.
Here is an example that uses the isplitRow
function from the itertools
package:
library(doParallel)
library(itertools)
dim1 <- 10
dim2 <- 100
df1 <- data.frame(a = 1:dim1, b = 1:dim1)
df2 <- data.frame(x= 1:dim2, y = 1:dim2, z = 1:dim2)
f <- function(...) sum(...)
nw <- 4
cl <- makeCluster(nw)
registerDoParallel(cl)
res <- foreach(d2=isplitRows(df2, chunks=nw), .combine=c) %dopar% {
expgrid <- expand.grid(x=seq(dim1), y=seq(nrow(d2)))
apply(expgrid, 1, function(i) f(df1[i[["x"]],], d2[i[["y"]],]))
}
I split df2
because it has more rows, but you could choose either.