I am new to spark and trying to understand the difference between normal RDD and a pair RDD. What are the use-cases where a pair RDD is used as opposed to a normal RDD? If possi
The key differences are:
pairRDD operations (such as map, reduceByKey etc) produce key,value pairs. Whereas operations on RDD(such as flatMap or reduce) gives you a collection of values or a single value
pairRDD operations are applied on each key/element in parallel.Operations on RDD (like flatMap) are applied to the whole collection.