how to deal with error SPARK-5063 in spark

前端 未结 2 785
终归单人心
终归单人心 2020-12-06 12:19

I get the error message SPARK-5063 in the line of println

val d.foreach{x=> for(i<-0 until x.length)
      println(m.lookup(x(i)))}    
相关标签:
2条回答
  • 2020-12-06 12:29

    SPARK-5063 relates to better error messages when trying to nest RDD operations, which is not supported.

    It's a usability issue, not a functional one. The root cause is the nesting of RDD operations and the solution is to break that up.

    Here we are trying a join of dRDD and mRDD. If the size of mRDD is large, a rdd.join would be the recommended way otherwise, if mRDD is small, i.e. fits in memory of each executor, we could collect it, broadcast it and do a 'map-side' join.

    JOIN

    A simple join would go like this:

    val rdd = sc.parallelize(Seq(Array("one","two","three"), Array("four", "five", "six")))
    val map = sc.parallelize(Seq("one" -> 1, "two" -> 2, "three" -> 3, "four" -> 4, "five" -> 5, "six"->6))
    val flat = rdd.flatMap(_.toSeq).keyBy(x=>x)
    val res = flat.join(map).map{case (k,v) => v}
    

    If we would like to use broadcast, we first need to collect the value of the resolution table locally in order to b/c that to all executors. NOTE the RDD to be broadcasted MUST fit in the memory of the driver as well as of each executor.

    Map-side JOIN with Broadcast variable

    val rdd = sc.parallelize(Seq(Array("one","two","three"), Array("four", "five", "six")))
    val map = sc.parallelize(Seq("one" -> 1, "two" -> 2, "three" -> 3, "four" -> 4, "five" -> 5, "six"->6)))
    val bcTable = sc.broadcast(map.collectAsMap)
    val res2 = rdd.flatMap{arr => arr.map(elem => (elem, bcTable.value(elem)))} 
    
    0 讨论(0)
  • 2020-12-06 12:38

    This RDD lacks a SparkContext. It could happen in the following cases:

    RDD transformations and actions are NOT invoked by the driver,

    but inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation

    0 讨论(0)
提交回复
热议问题