Why does Spark fail with “Failed to get broadcast_0_piece0 of broadcast_0” in local mode?

前端 未结 5 1194
孤城傲影
孤城傲影 2021-02-06 10:59

I\'m running this snippet to sort an RDD of points, ordering the RDD and taking the K-nearest points from a given point:

def getKNN(sparkContext:SparkContext, k:         


        
相关标签:
5条回答
  • 2021-02-06 11:32

    Related to the above answers, I encountered this issue when I inadvertently serialized a datastax connector (i.e Cassandra connection driver) query to a spark slave. This then spun off its own SparkContext and within 4 seconds the entire application had crashed

    0 讨论(0)
  • 2021-02-06 11:34

    Just discovered why I was getting this exception: for a reason my SparkContext object started/stopped several times between ScalaTest methods. So, fixing that behaviour lead me to get spark working in the right way I would expect.

    0 讨论(0)
  • 2021-02-06 11:34

    For me helped this, because SparkContext was already created

    val sc = SparkContext.getOrCreate()
    

    Before i tried with this

    val conf = new SparkConf().setAppName("Testing").setMaster("local").set("spark.driver.allowMultipleContexts", "true")
    val sc = SparkContext(conf)
    

    But it was broken when i ran

     spark.createDataFrame(rdd, schema)
    
    0 讨论(0)
  • 2021-02-06 11:38

    I was also facing the same issue. after a lot of googling I found that I have made a singleton class for SparkContext initialization which is only valid for a single JVM instance, but in case of Spark this singleton class will be invoked from each worker node running on separate JVM instance and hence lead to multiple SparkContext object.

    0 讨论(0)
  • 2021-02-06 11:40

    I was getting this error as well. I haven't really seen any concrete coding examples, so I will share my solution. This cleared the error for me, but I have a sense that there may be more than 1 solution to this problem. But this would be worth a go as it keeps everything within the code.

    It looks as though the SparkContext was shutting down, thus throwing the error. I think the issue is that the SparkContext is created in a class and then extended to other classes. The extension causes it to shut down, which is a bit annoying. Below is the implementation I used to get this error to clear.

    Spark Initialisation Class:

    import org.apache.spark.{SparkConf, SparkContext}
    
    class Spark extends Serializable {
      def getContext: SparkContext = {
        @transient lazy val conf: SparkConf = 
              new SparkConf()
              .setMaster("local")
              .setAppName("test")
    
        @transient lazy val sc: SparkContext = new SparkContext(conf)
        sc.setLogLevel("OFF")
    
       sc
      }
     }
    

    Main Class:

    object Test extends Spark{
    
      def main(args: Array[String]): Unit = {
      val sc = getContext
      val irisRDD: RDD[String] = sc.textFile("...")
    ...
    }
    

    Then just extend your other class with the the Spark Class and it should all work out.

    I was getting the error running LogisticRegression Models, so I would assume this should fix it for you as well with other Machine Learning libraries as well.

    0 讨论(0)
提交回复
热议问题