Here is my code.
var link = scala.collection.mutable.LinkedHashMap[String, String]()
var fieldTypeMapRDD = fixedRDD.mapPartitionsWithIndex((idx, itr) => itr.m
Why your code is not supposed to work:
foreach
task is started, whole your function's closure inside foreach
block is serialized and sent first to master, then to each of workers. This means each of them will have its own instance of mutable.LinkedHashMap
as copy of link
.foreach
block each worker will put each of its items inside its own link
copylink
and several non-empty former copies on each of worker nodes.Moral is clear: don't use local mutable collections with RDD. It's just not going to work.
One way to get whole collection to local machine is collect
method.
You can use it as:
val link = fieldTypeMapRDD.collect.toMap
or in case of need to preserve the order:
import scala.collection.immutable.ListMap
val link = ListMap(fieldTypeMapRDD.collect:_*)
But if you are really into mutable
collections, you can modify your code a bit. Just change
fieldTypeMapRDD.foreach {
to
fieldTypeMapRDD.toLocalIterator.foreach {
See also this question.