Is there an efficient way to parallelize mapply?

后端未结

关注

 3  1444

I have many rows and on every row I compute the uniroot of a non-linear function. I have a quad-core Ubuntu machine which hasn\'t stopped running my code for two days now. N

相关标签:

3条回答

清歌不尽

2021-01-05 12:28
I'd use the parallel package that's built into R 2.14 and work with matrices. You could then simply use mclapply like this:
```
dfm <- as.matrix(df)
result <- mclapply(seq_len(nrow(dfm)),
          function(x) do.call(get_uniroot,as.list(dfm[x,])),
          mc.cores=4L
          )
unlist(result)
```
This is basically doing the same mapply does, but in a parallel way.

But...

Mind you that parallelization always counts for some overhead as well. As I explained in the question you link to, going parallel only pays off if your inner function calculates significantly longer than the overhead involved. In your case, your uniroot function works pretty fast. You might then consider to cut your data frame in bigger chunks, and combine both mapply and mclapply. A possible way to do this is:
```
ncores <- 4
id <- floor(
        quantile(0:nrow(df),
                 1-(0:ncores)/ncores
        )
      )
idm <- embed(id,2)

mapply_uniroot <- function(id){
  tmp <- df[(id[1]+1):id[2],]
  mapply(get_uniroot, tmp$P, tmp$B0, tmp$CF1, tmp$CF2, tmp$CF3)
}
result <-mclapply(nrow(idm):1,
                  function(x) mapply_uniroot(idm[x,]),
                  mc.cores=ncores)
final <- unlist(result)
```
This might need some tweaking, but it essentially breaks your df in exactly as many bits as there are cores, and run the mapply on every core. To show this works :
```
> x1 <- mapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3)
> all.equal(final,x1)
[1] TRUE
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

你的背包

2021-01-05 12:28

This isn't exactly a best practices suggestion, but considerable speed-up can be had by identifying the root for all parameters in a 'vectorized' fashion. For instance,

bisect <-
    function(f, interval, ..., lower=min(interval), upper=max(interval),
             f.lower=f(lower, ...), f.upper=f(upper, ...), maxiter=20)
{
    nrow <- length(f.lower)
    bounds <- matrix(c(lower, upper), nrow, 2, byrow=TRUE)
    for (i in seq_len(maxiter)) {
        ## move lower or upper bound to mid-point, preserving opposite signs
        mid <- rowSums(bounds) / 2
        updt <- ifelse(f(mid, ...) > 0, 0L, nrow) + seq_len(nrow)
        bounds[updt] <- mid
    }
    rowSums(bounds) / 2
}

and then

> system.time(x2 <- with(df, {
+     f <- function(x, PB0, CF1, CF2, CF3)
+         PB0 + CF1/x + CF2/x^2 + CF3/x^3
+     bisect(f, c(1, 10), PB0, CF1, CF2, CF3)
+ }))
   user  system elapsed 
  0.180   0.000   0.181 
> range(x1 - x2)
[1] -6.282406e-06  6.658593e-06

versus about 1.3s for application of uniroot separately to each. This also combined P and B0 into a single value ahead of time, since that is how they enter the equation.

The bounds on the final value are +/- diff(interval) * (.5 ^ maxiter) or so. A fancier implementation would replace bisection with linear or quadratic interpolation (as in the reference cited in ?uniroot), but then uniform efficient convergence (and in all cases error handling) would be more tricky to arrange.

0 讨论(0)

鱼传尺愫

2021-01-05 12:43

it's an old topic but fyi you now have parallel::mcmapply doc is here. don't forget to set mc.cores in the options. I usually use mc.cores=parallel::detectCores()-1 to let one cpu free for OS operations.

x4 <- mcmapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3,mc.cores=parallel::detectCores()-1)

0 讨论(0)
发布评论:

提交评论
- 加载中...