Why spark broadcast doesn't work well when I use extends App?

前端 未结 2 1341
别跟我提以往
别跟我提以往 2020-12-21 07:00

The first code throws null pointer exception.

object TryBroadcast extends App{
  val conf = new SparkConf().setAppName(\"o_o\")
  val sc = new SparkContext(c         


        
相关标签:
2条回答
  • 2020-12-21 07:36

    It is not very well documented but it is recommended to use def main(args: Array[String]): Unit = ??? instead of extends App.

    See https://issues.apache.org/jira/browse/SPARK-4170 and https://github.com/apache/spark/pull/3497

    0 讨论(0)
  • 2020-12-21 07:43

    bro in the two cases is quite different. In the first one it's a field on a singleton class instance (TryBroadcast). In the second one it is a local variable.

    I the local variable gets captured, serialized and sent over to the executors. In the first case the reference is to a field, so the singleton would get captured and sent. I'm not sure how a Scala singleton is built and how it is captured. Apparently in this case it ends up uninitialized when it is accessed on the executor.

    You could make bro a local variable like this:

    object TryBroadcast extends App {
      val conf = new SparkConf().setAppName("o_o")
      val sc = new SparkContext(conf)
      val sample = sc.parallelize(1 to 1024)
      val broSample = {
        val bro = sc.broadcast(6666)
        sample.map(x => x.toString + bro.value)
      }
      broSample.collect().foreach(println)
    }
    
    0 讨论(0)
提交回复
热议问题