I have two RDDs.
rdd1 = (String, String)
key1, value11
key2, value12
key3, value13
rdd2 = (String, String)
key2, value2
I think this may be what you are looking for:
join(otherDataset, [numTasks])
When called on datasets of type (K, V) and (K, W), returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key. Outer joins are supported through leftOuterJoin, rightOuterJoin, and fullOuterJoin.
See the associated section of the docs
Check join()
in PairRDDFunctions:
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions