How spark handles object

前端未结

关注

 2  1670

旧时难觅i 2021-01-02 13:19

To test the Serialization exception in spark I wrote a task in 2 ways.
First way:

package examples
import org.apache.spark.SparkConf
import org.apache.


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   囚心锁ツ
                                             
                
                
                (楼主)
            
              
              
                2021-01-02 13:32
              

            
            
                        
In order for Spark to distribute a given operation, the function used in the operation needs to be serialized. Before serialization, these functions pass through a complex process appropriately called "ClosureCleaner". 

The intention is to "cut off" closures from their context in order to reduce the size of the object graph needed to be serialized and reduce the risk of serialization issues in the process. In other words, ensure that only the code needed to execute the function is serialized and sent for deserialization and execution "at the other side"

During that process, the closure is also evaluated to be Serializable to be proactive about detecting serialization issues at runtime (SparkContext#clean).

That code is dense and complex so it's hard to find the right code path leading to this case. 

Intuitively, what's happening is that when the ClosureCleaner finds:

val result = rdd.map{elem => 
  funcs.func_1(elem)
} 


It evaluates the inner members of the closure to be from an object that can be recreated and there are no further references, so the cleaned closure only contains {elem => funcs.func_1(elem)} which can be serialized by the JavaSerializer.

Instead, when the closure cleaner evaluates: 

val handler = funcs
val result = rdd.map(elem => {
  handler.func_1(elem)
})


It finds that the closure has a reference to $outer (handler), hence it inspects the outer scope and adds the and variable instance to the cleaned closure. We could imagine the resulting cleaned closure to be something of this shape (this is for illustrative purposes only):

{elem => 
  val handler = funcs
  handler.func_1(elem)
} 


When the closure is tested for serialization, it fails to serialize. Per JVM serialization rules, an object is serializable if recursively all its members are serializable. In this case handler references a non-serializable object and the check fails.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复