I\'m running R on linux box that has 8 multicore processors, and have an optimization problem I\'d like to speed up by parallelizing the optimization routine itself. Importantly
I am the author of the R package optimParallel. It provides parallel versions of the gradient-based optimization methods of optim()
. The main function of the package is optimParallel()
, which has the same usage and output as optim()
. Using optimParallel()
can significantly reduce optimization times as illustrated in the following figure (p
is the number of paramters).
See https://cran.r-project.org/package=optimParallel and http://arxiv.org/abs/1804.11058 for more information.
I used the package doSNOW to run a code on 8 cores. I can just copy&paste the part of the code that refers to this package. Hope it helps!
# use multicore libraries
# specify number of cores to use
cores<- 8
cluster <- makeCluster(cores, type="SOCK")
registerDoSNOW(cluster)
# check how many cores will be used
ncores <- getDoParWorkers()
print(paste("Computing algorithm for ", cores, " cores", sep=""))
fph <- rep(-100,12)
# start multicore cicle on 12 subsets
fph <- foreach(i=1:12, .combine='c') %dopar% {
PhenoRiceRun(sub=i, mpath=MODIS_LOCAL_DIR, masklocaldir=MASK_LOCAL_DIR, startYear=startYear, tile=tile, evismoothopt=FALSE)
}
stopCluster(cluster) # check if gives error
gc(verbose=FALSE)
As you have not accepted an answer, this idea might help:
For global optimisation the package DEoptim()
has an in-built option for parallel optimisation. Nice thing is, it's easy to use and documentation well written.
c.f. http://www.jstatsoft.org/v40/i06/paper (currently down)
http://cran.r-project.org/web/packages/DEoptim/index.html
Beware: Differential Evolglobal optimizers might still run into locals.
Here is a rough solution, that at least has some promise. Big thanks to Ben Bolker for pointing out that many/most optimization routines allow user-specified gradient functions.
A test problem with more parameter values might show more significant improvements, but on an 8 core machine the run using the parallelized gradient function takes about 70% as long as the serial version. Note the crude gradient approximation used here seems to slow convergence and thus adds some time to the process.
## Set up the cluster
require("parallel");
.nlocalcores = NULL; # Default to "Cores available - 1" if NULL.
if(is.null(.nlocalcores)) { .nlocalcores = detectCores() - 1; }
if(.nlocalcores < 1) { print("Multiple cores unavailable! See code!!"); return()}
print(paste("Using ",.nlocalcores,"cores for parallelized gradient computation."))
.cl=makeCluster(.nlocalcores);
print(.cl)
# Now define a gradient function: both in serial and in parallel
mygr <- function(.params, ...) {
dp = cbind(rep(0,length(.params)),diag(.params * 1e-8)); # TINY finite difference
Fout = apply(dp,2, function(x) fn(.params + x,...)); # Serial
return((Fout[-1]-Fout[1])/diag(dp[,-1])); # finite difference
}
mypgr <- function(.params, ...) { # Now use the cluster
dp = cbind(rep(0,length(.params)),diag(.params * 1e-8));
Fout = parCapply(.cl, dp, function(x) fn(.params + x,...)); # Parallel
return((Fout[-1]-Fout[1])/diag(dp[,-1])); #
}
## Lets try it out!
fr <- function(x, slow=FALSE) { ## Rosenbrock Banana function from optim() documentation.
if(slow) { Sys.sleep(0.1); } ## Modified to be a little slow, if needed.
x1 <- x[1]
x2 <- x[2]
100 * (x2 - x1 * x1)^2 + (1 - x1)^2
}
grr <- function(x, slow=FALSE) { ## Gradient of 'fr'
if(slow) { Sys.sleep(0.1); } ## Modified to be a little slow, if needed.
x1 <- x[1]
x2 <- x[2]
c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1),
200 * (x2 - x1 * x1))
}
## Make sure the nodes can see these functions & other objects as called by the optimizer
fn <- fr; # A bit of a hack
clusterExport(cl, "fn");
# First, test our gradient approximation function mypgr
print( mypgr(c(-1.2,1)) - grr(c(-1.2,1)))
## Some test calls, following the examples in the optim() documentation
tic = Sys.time();
fit1 = optim(c(-1.2,1), fr, slow=FALSE); toc1=Sys.time()-tic
fit2 = optim(c(-1.2,1), fr, gr=grr, slow=FALSE, method="BFGS"); toc2=Sys.time()-tic-toc1
fit3 = optim(c(-1.2,1), fr, gr=mygr, slow=FALSE, method="BFGS"); toc3=Sys.time()-tic-toc1-toc2
fit4 = optim(c(-1.2,1), fr, gr=mypgr, slow=FALSE, method="BFGS"); toc4=Sys.time()-tic-toc1-toc2-toc3
## Now slow it down a bit
tic = Sys.time();
fit5 = optim(c(-1.2,1), fr, slow=TRUE); toc5=Sys.time()-tic
fit6 = optim(c(-1.2,1), fr, gr=grr, slow=TRUE, method="BFGS"); toc6=Sys.time()-tic-toc5
fit7 = optim(c(-1.2,1), fr, gr=mygr, slow=TRUE, method="BFGS"); toc7=Sys.time()-tic-toc5-toc6
fit8 = optim(c(-1.2,1), fr, gr=mypgr, slow=TRUE, method="BFGS"); toc8=Sys.time()-tic-toc5-toc6-toc7
print(cbind(fast=c(default=toc1,exact.gr=toc2,serial.gr=toc3,parallel.gr=toc4),
slow=c(toc5,toc6,toc7,toc8)))