LinkedHashMap variable is not accessable out side the foreach loop

前端 未结 1 2038
清酒与你
清酒与你 2021-01-25 20:28

Here is my code.

var link = scala.collection.mutable.LinkedHashMap[String, String]()
var fieldTypeMapRDD = fixedRDD.mapPartitionsWithIndex((idx, itr) => itr.m         


        
相关标签:
1条回答
  • 2021-01-25 21:17

    Why your code is not supposed to work:

    1. Before your foreach task is started, whole your function's closure inside foreach block is serialized and sent first to master, then to each of workers. This means each of them will have its own instance of mutable.LinkedHashMap as copy of link.
    2. During foreach block each worker will put each of its items inside its own link copy
    3. After your task is done you have still empty local link and several non-empty former copies on each of worker nodes.

    Moral is clear: don't use local mutable collections with RDD. It's just not going to work.

    One way to get whole collection to local machine is collect method. You can use it as:

    val link = fieldTypeMapRDD.collect.toMap
    

    or in case of need to preserve the order:

    import scala.collection.immutable.ListMap
    val link = ListMap(fieldTypeMapRDD.collect:_*)
    

    But if you are really into mutable collections, you can modify your code a bit. Just change

    fieldTypeMapRDD.foreach {
    

    to

    fieldTypeMapRDD.toLocalIterator.foreach {
    

    See also this question.

    0 讨论(0)
提交回复
热议问题