Difference and use-cases of RDD and Pair RDD

前端未结

关注

 4  1196

一个人的身影 2021-02-19 10:21

I am new to spark and trying to understand the difference between normal RDD and a pair RDD. What are the use-cases where a pair RDD is used as opposed to a normal RDD? If possi

4条回答

一个人的身影 (楼主)

2021-02-19 10:56

Spark provides special operations on RDDs containing key/value pairs. These RDDs are called pair RDDs. Pair RDDs are a useful building block in many programs, as they expose operations that allow you to act on each key in parallel or regroup data across the network. For example, pair RDDs have a reduceByKey() method that can aggregate data separately for each key, and a join() method that can merge two RDDs together by grouping elements with the same key. It is common to extract fields from an RDD (representing, for instance, an event time, customer ID, or other identifier) and use those fields as keys in pair RDD operations.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...