How does Spark send closures to workers?

前端 未结 1 1102
清歌不尽
清歌不尽 2021-02-02 17:48

When I write an RDD transformation, e.g.

val rdd = sc.parallelise(1 to 1000) 
rdd.map(x => x * 3)

I understand that the closure (x =&

相关标签:
1条回答
  • 2021-02-02 18:36

    The closures are most certainly serialized at runtime. I have plenty of instances seen Closure Not Serializable exceptions at runtime - from pyspark and from scala. There is complex code called

    From ClosureCleaner.scala

    def clean(
        closure: AnyRef,
        checkSerializable: Boolean = true,
        cleanTransitively: Boolean = true): Unit = {
      clean(closure, checkSerializable, cleanTransitively, Map.empty)
    }
    

    that attempts to minify the code being serialized. The code is then sent across the wire - if it were serializable. Otherwise an exception will be thrown.

    Here is another excerpt from ClosureCleaner to check the ability to serialize an incoming function:

      private def ensureSerializable(func: AnyRef) {
        try {
          if (SparkEnv.get != null) {
            SparkEnv.get.closureSerializer.newInstance().serialize(func)
          }
        } catch {
          case ex: Exception => throw new SparkException("Task not serializable", ex)
        }
      }
    
    0 讨论(0)
提交回复
热议问题