Difference and use-cases of RDD and Pair RDD

前端 未结 4 1164
一个人的身影
一个人的身影 2021-02-19 10:21

I am new to spark and trying to understand the difference between normal RDD and a pair RDD. What are the use-cases where a pair RDD is used as opposed to a normal RDD? If possi

4条回答
  •  温柔的废话
    2021-02-19 10:38

    PairRDDs are KEY/VALUE pairs.

    Example: If you have a csv with details of airport in a country. We create normal RDD by reading that CSV from path.(columns:Airport ID, Name of airport, Main city served by airport, County where airport is located)

    JavaRDD airports = sc.textFile("in/airports.text");
    

    If we want an RDD with airport names and country in which it located,here we have to create pair RDD from above RDD.

    JavaPairRDD AirportspairRDD = airports.mapToPair((PairFunction) s ->     {
        return new Tuple2<>(s.split(",")[1],s.split(",")[3]);
    });
    

提交回复
热议问题