Trying to get started with doParallel and foreach but no improvement

前端未结

关注

 1  433

I am trying to use the doParallel and foreach package but I\'m getting reduction in performance using the bootstrapping example in the guide found here CRANpage.


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  既然无缘        
                
              
                            
                2021-02-06 02:15
              
            
            
                                                                       
The underlying problem is that doParallel executes attach for every task execution on the workers of the PSOCK cluster in order to add the exported variables to the package search path.  This resolves various scoping issues, but can hurt performance significantly, particularly with short duration tasks and large amounts of exported data.  This doesn't happen on Linux and Mac OS X with your example, since they will use mclapply, rather than clusterApplyLB, but it will happen on all platforms if you explicitly register a PSOCK cluster.

I believe that I've figured out how to resolve the task scoping problems in a different way that doesn't hurt performance, and I'm working with Revolution Analytics to get the fix into the next release of doParallel and doSNOW, which also has the same problem.

You can work around this problem by using task chunking:

ptime2 <- system.time({
  chunks <- getDoParWorkers()
  r <- foreach(n=idiv(trials, chunks=chunks), .combine='cbind') %dopar% {
    y <- lapply(seq_len(n), function(i) {
      ind <- sample(100, 100, replace=TRUE)
      result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
      coefficients(result1)
    })
    do.call('cbind', y)
  }
})[3]


This results in only one task per worker, so each worker only executes attach once, rather than trials / 3 times.  It also results in fewer but larger socket operations, which can be performed more efficiently on most systems, but in this case, the critical issue is attach.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复