Merge two RDDs in Spark Scala

后端 未结 2 712
清歌不尽
清歌不尽 2021-01-21 04:01

I have two RDDs.

rdd1 = (String, String)

key1, value11
key2, value12
key3, value13

rdd2 = (String, String)

key2, value2         


        
相关标签:
2条回答
  • 2021-01-21 04:33

    I think this may be what you are looking for:

    join(otherDataset, [numTasks])  
    

    When called on datasets of type (K, V) and (K, W), returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key. Outer joins are supported through leftOuterJoin, rightOuterJoin, and fullOuterJoin.

    See the associated section of the docs

    0 讨论(0)
  • 2021-01-21 04:41

    Check join() in PairRDDFunctions:

    https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions

    0 讨论(0)
提交回复
热议问题