What is the best way to preform a flatMap
on a DataFrame
in spark?
From searching around and doing some testing, I have come up with two different
You can create a second dataframe
from your map
RDD:
val mapDF = Map("a" -> List("c","d","e"), "b" -> List("f","g","h")).toList.toDF("key", "value")
Then do the join
and apply the explode
function:
val joinedDF = df.join(mapDF, df("x") === mapDF("key"), "inner")
.select("value", "y")
.withColumn("value", explode($"value"))
And you get the solution.
joinedDF.show()