I am new to spark and trying to understand the difference between normal RDD and a pair RDD. What are the use-cases where a pair RDD is used as opposed to a normal RDD? If possi
PairRDDs are KEY/VALUE pairs.
Example: If you have a csv with details of airport in a country. We create normal RDD by reading that CSV from path.(columns:Airport ID, Name of airport, Main city served by airport, County where airport is located)
JavaRDD airports = sc.textFile("in/airports.text");
If we want an RDD with airport names and country in which it located,here we have to create pair RDD from above RDD.
JavaPairRDD AirportspairRDD = airports.mapToPair((PairFunction) s -> {
return new Tuple2<>(s.split(",")[1],s.split(",")[3]);
});