Parallelization in R: %dopar% vs %do%. Why using a single core yields to better performance?

前端 未结 1 850
面向向阳花
面向向阳花 2021-02-09 14:14

I\'m experiencing a weird behaviour in my computer when distributing processes among its cores using doMC and foreach. Does someone knows why using single core I got better perf

相关标签:
1条回答
  • 2021-02-09 15:11

    It's the combination of results that eats all the processing time. These are the timings on my machine for the cores=2 scenario if no results are returned. It's essentially the same code, only the created matrices are discarded instead of being returned:

    > system.time(m <- foreach(i=1:100) %do% 
    + { matrix(rnorm(1000*1000), ncol=5000); NULL } )
       user  system elapsed 
     13.793   0.376  14.197 
    > system.time(m <- foreach(i=1:100) %dopar% 
    + { matrix(rnorm(1000*1000), ncol=5000); NULL } )
       user  system elapsed 
      8.057   5.236   9.970 
    

    Still not optimal, but at least the parallel version is now faster.

    This is from documentation of doMC:

    The doMC package provides a parallel backend for the foreach/%dopar% function using the multicore functionality of the parallel package.

    Now, parallel uses a fork mechanism to spawn identical copies of the R process. Collecting results from separate processes is an expensive task, and this is what you see in your time measurements.

    0 讨论(0)
提交回复
热议问题