Difference and use-cases of RDD and Pair RDD

前端未结

关注

 4  1164

一个人的身影 2021-02-19 10:21

I am new to spark and trying to understand the difference between normal RDD and a pair RDD. What are the use-cases where a pair RDD is used as opposed to a normal RDD? If possi

4条回答

温柔的废话 (楼主)

2021-02-19 10:38
PairRDDs are KEY/VALUE pairs.

Example: If you have a csv with details of airport in a country. We create normal RDD by reading that CSV from path.(columns:Airport ID, Name of airport, Main city served by airport, County where airport is located)
```
JavaRDD airports = sc.textFile("in/airports.text");
```
If we want an RDD with airport names and country in which it located,here we have to create pair RDD from above RDD.
```
JavaPairRDD AirportspairRDD = airports.mapToPair((PairFunction) s ->     {
    return new Tuple2<>(s.split(",")[1],s.split(",")[3]);
});
```
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...