How spark handles object

前端 未结 2 1672
旧时难觅i
旧时难觅i 2021-01-02 13:19

To test the Serialization exception in spark I wrote a task in 2 ways.
First way:

package examples
import org.apache.spark.SparkConf
import org.apache.         


        
2条回答
  •  傲寒
    傲寒 (楼主)
    2021-01-02 13:40

    When you run code in an RDD closure (map, filter, etc...), everything necessary to execute that code will be packaged up, serialized, and sent to the executors to be run. Any objects that are referenced (or whose fields are referenced) will be serialized in this task, and this is where you'll sometimes get a NotSerializableException.

    Your use case is a little more complicated, though, and involves the scala compiler. Typically, calling a function on a scala object is the equivalent of calling a java static method. That object never really exists -- it's basically like writing the code inline. However, if you assign an object to a variable, then you're actually creating a reference to that object in memory, and the object behaves more like a class, and can have serialization issues.

    scala> object A { 
      def foo() { 
        println("bar baz")
      }
    }
    defined module A
    
    scala> A.foo()  // static method
    bar baz
    
    scala> val a = A  // now we're actually assigning a memory location
    a: A.type = A$@7e0babb1
    
    scala> a.foo()  // dereferences a before calling foo
    bar baz
    

提交回复
热议问题