Spark: Difference between Shuffle Write, Shuffle spill (memory), Shuffle spill (disk)?

前端未结

关注

 3  599

暖寄归人 2021-02-01 17:29

I have the following spark job, trying to keep everything in memory:

val myOutRDD = myInRDD.flatMap { fp =>
  val tuple2List: ListBuffer[(String, myClass)] =


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   被撕碎了的回忆
                                             
                
                
                (楼主)
            
              
              
                2021-02-01 18:09
              

            
            
                        
One more note on how to prevent shuffle spill, since I think that is the most important part of the question from a performance aspect (shuffle write, as mentioned above, is a required part of shuffling).

Spilling occurs when the at shuffle read, any reducer cannot fit all of the records assigned to it in memory in the shuffle space on that executor. If your shuffle is unbalanced (e.g. some output partitions are much larger than some input partitions), you may have shuffle spill even if the partitions "fit in memory" before the shuffle. The best way to control this is by 
A) balancing the shuffle... e.g changing your code to reduce before shuffling or by shuffling on different keys
 or 
B) changing the shuffle memory settings as suggested above
Given the extent of the spill to disk you probably need to do A rather than B.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复