Does coalesce(numPartitions) in spark undergo shuffling or not?

前端 未结 1 1538
日久生厌
日久生厌 2020-12-21 11:11

I have a simple question in spark transformation function.

coalesce(numPartitions) - Decrease the number of partitions in the RDD to numPartitions. Useful for runn

相关标签:
1条回答
  • 2020-12-21 12:06

    The coalesce transformation is used to reduce the number of partitions. coalesce should be used if the number of output partitions is less than the input. It can trigger RDD shuffling depending on the shuffle flag which is disabled by default (i.e. false).

    If number of partitions is larger than current number of partitions and you are using coalesce method without shuffle=true flag then number of partitions remains unchanged.coalesce doesn't guarantee that the empty partitions will be removed. For example if you have 20 empty partitions and 10 partitions with data, then there will still be empty partitions after you call rdd.coalesce(25). If you use coalesce with shuffle set to true then this will be equivalent to repartition method and data will be evenly distributed across the partitions.

    0 讨论(0)
提交回复
热议问题