Base R is single-threaded so that 25% of usage is expected on 4-core CPU. On a single Windows machine, it is possible to spread processing across clusters (or cores if you like) using either the parallel package and the foreach package.
First of all, the parallel package (included in R 2.8.0+, no need to install) provides functions based on the snow package - these functions are extensions of lapply()
. And the foreach package provides an extension of for-loop construct - note that it has to be used with the doParallel package.
Below is a quick example of k-means clustering using both the packages. The idea is simple, which is (1) fitting kmeans()
in each cluster, (2) combining the outcomes and (3) seleting minimum tot.withiness
.
library(parallel)
library(iterators)
library(foreach)
library(doParallel)
# parallel
split = detectCores()
eachStart = 25
cl = makeCluster(split)
init = clusterEvalQ(cl, { library(MASS); NULL })
results = parLapplyLB(cl
,rep(eachStart, split)
,function(nstart) kmeans(Boston, 4, nstart=nstart))
withinss = sapply(results, function(result) result$tot.withinss)
result = results[[which.min(withinss)]]
stopCluster(cl)
result$tot.withinss
#[1] 1814438
# foreach
split = detectCores()
eachStart = 25
# set up iterators
iters = iter(rep(eachStart, split))
# set up combine function
comb = function(res1, res2) {
if(res1$tot.withinss < res2$tot.withinss) res1 else res2
}
cl = makeCluster(split)
registerDoParallel(cl)
result = foreach(nstart=iters, .combine="comb", .packages="MASS") %dopar%
kmeans(Boston, 4, nstart=nstart)
stopCluster(cl)
result$tot.withinss
#[1] 1814438
Further details of those packages and more examples can be found in the following posts.
- Parallel Processing on Single Machine I
- Parallel Processing on Single Machine II
- Parallel Processing on Single Machine III