发表新帖

发表新帖

Does coalesce(numPartitions) in spark undergo shuffling or not?

前端未结

关注

 1  1538

I have a simple question in spark transformation function.

coalesce(numPartitions) - Decrease the number of partitions in the RDD to numPartitions. Useful for runn

相关标签:

1条回答

轻奢々

2020-12-21 12:06

The coalesce transformation is used to reduce the number of partitions. coalesce should be used if the number of output partitions is less than the input. It can trigger RDD shuffling depending on the shuffle flag which is disabled by default (i.e. false).

If number of partitions is larger than current number of partitions and you are using coalesce method without shuffle=true flag then number of partitions remains unchanged.coalesce doesn't guarantee that the empty partitions will be removed. For example if you have 20 empty partitions and 10 partitions with data, then there will still be empty partitions after you call rdd.coalesce(25). If you use coalesce with shuffle set to true then this will be equivalent to repartition method and data will be evenly distributed across the partitions.

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题