Parallelization in R: %dopar% vs %do%. Why using a single core yields to better performance?

前端未结

关注

 1  853

面向向阳花 2021-02-09 14:14

I\'m experiencing a weird behaviour in my computer when distributing processes among its cores using doMC and foreach. Does someone knows why using single core I got better perf

1条回答

误落风尘 (楼主)

2021-02-09 15:11
It's the combination of results that eats all the processing time. These are the timings on my machine for the cores=2 scenario if no results are returned. It's essentially the same code, only the created matrices are discarded instead of being returned:
```
> system.time(m <- foreach(i=1:100) %do% 
+ { matrix(rnorm(1000*1000), ncol=5000); NULL } )
   user  system elapsed 
 13.793   0.376  14.197 
> system.time(m <- foreach(i=1:100) %dopar% 
+ { matrix(rnorm(1000*1000), ncol=5000); NULL } )
   user  system elapsed 
  8.057   5.236   9.970 
```
Still not optimal, but at least the parallel version is now faster.

This is from documentation of doMC:

The doMC package provides a parallel backend for the foreach/%dopar% function using the multicore functionality of the parallel package.

Now, parallel uses a fork mechanism to spawn identical copies of the R process. Collecting results from separate processes is an expensive task, and this is what you see in your time measurements.
0 讨论(0)
发布评论:

提交评论
- 加载中...