I am using java8 with spark v2.4.1.
I am trying to use Broadcast variable Map
for look up using as show below:
Input data:
+-----+-
lit()
return Column type, but map.get require the int type
you can do in this way
val df: DataFrame = spark.sparkContext.parallelize(Range(0, 10000), 4).toDF("sentiment")
val map = new util.HashMap[Int, Int]()
map.put(1, 1)
map.put(2, 2)
map.put(3, 3)
val bf: Broadcast[util.HashMap[Int, Int]] = spark.sparkContext.broadcast(map)
df.rdd.map(x => {
val num = x.getInt(0)
(num, bf.value.get(num))
}).toDF("key", "add_key").show(false)