I have many rows and on every row I compute the uniroot of a non-linear function. I have a quad-core Ubuntu machine which hasn\'t stopped running my code for two days now. N
I'd use the parallel
package that's built into R 2.14 and work with matrices. You could then simply use mclapply
like this:
dfm <- as.matrix(df)
result <- mclapply(seq_len(nrow(dfm)),
function(x) do.call(get_uniroot,as.list(dfm[x,])),
mc.cores=4L
)
unlist(result)
This is basically doing the same mapply does, but in a parallel way.
But...
Mind you that parallelization always counts for some overhead as well. As I explained in the question you link to, going parallel only pays off if your inner function calculates significantly longer than the overhead involved. In your case, your uniroot function works pretty fast. You might then consider to cut your data frame in bigger chunks, and combine both mapply and mclapply. A possible way to do this is:
ncores <- 4
id <- floor(
quantile(0:nrow(df),
1-(0:ncores)/ncores
)
)
idm <- embed(id,2)
mapply_uniroot <- function(id){
tmp <- df[(id[1]+1):id[2],]
mapply(get_uniroot, tmp$P, tmp$B0, tmp$CF1, tmp$CF2, tmp$CF3)
}
result <-mclapply(nrow(idm):1,
function(x) mapply_uniroot(idm[x,]),
mc.cores=ncores)
final <- unlist(result)
This might need some tweaking, but it essentially breaks your df in exactly as many bits as there are cores, and run the mapply on every core. To show this works :
> x1 <- mapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3)
> all.equal(final,x1)
[1] TRUE
This isn't exactly a best practices suggestion, but considerable speed-up can be had by identifying the root for all parameters in a 'vectorized' fashion. For instance,
bisect <-
function(f, interval, ..., lower=min(interval), upper=max(interval),
f.lower=f(lower, ...), f.upper=f(upper, ...), maxiter=20)
{
nrow <- length(f.lower)
bounds <- matrix(c(lower, upper), nrow, 2, byrow=TRUE)
for (i in seq_len(maxiter)) {
## move lower or upper bound to mid-point, preserving opposite signs
mid <- rowSums(bounds) / 2
updt <- ifelse(f(mid, ...) > 0, 0L, nrow) + seq_len(nrow)
bounds[updt] <- mid
}
rowSums(bounds) / 2
}
and then
> system.time(x2 <- with(df, {
+ f <- function(x, PB0, CF1, CF2, CF3)
+ PB0 + CF1/x + CF2/x^2 + CF3/x^3
+ bisect(f, c(1, 10), PB0, CF1, CF2, CF3)
+ }))
user system elapsed
0.180 0.000 0.181
> range(x1 - x2)
[1] -6.282406e-06 6.658593e-06
versus about 1.3s for application of uniroot separately to each. This also combined P and B0 into a single value ahead of time, since that is how they enter the equation.
The bounds on the final value are +/- diff(interval) * (.5 ^ maxiter)
or so. A fancier implementation would replace bisection with linear or quadratic interpolation (as in the reference cited in ?uniroot
), but then uniform efficient convergence (and in all cases error handling) would be more tricky to arrange.
it's an old topic but fyi you now have parallel::mcmapply
doc is here. don't forget to set mc.cores
in the options. I usually use mc.cores=parallel::detectCores()-1
to let one cpu free for OS operations.
x4 <- mcmapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3,mc.cores=parallel::detectCores()-1)