Which function in spark is used to combine two RDDs by keys

后端 未结 2 1519
鱼传尺愫
鱼传尺愫 2021-01-07 19:05

Let us say I have the following two RDDs, with the following key-pair values.

rdd1 = [ (key1, [value1, value2]), (key2         


        
相关标签:
2条回答
  • 2021-01-07 19:53

    I would union the two RDDs and to a reduceByKey to merge the values.

    (rdd1 union rdd2).reduceByKey(_ ++ _)
    
    0 讨论(0)
  • 2021-01-07 19:56

    Just use join and then map the resulting rdd.

    rdd1.join(rdd2).map(case (k, (ls, rs)) => (k, ls ++ rs))
    
    0 讨论(0)
提交回复
热议问题