Spark - repartition() vs coalesce()

前端 未结 14 1798
误落风尘
误落风尘 2020-11-22 17:11

According to Learning Spark

Keep in mind that repartitioning your data is a fairly expensive operation. Spark also has an optimized version of

14条回答
  •  花落未央
    2020-11-22 17:18

    In a simple way COALESCE :- is only for decreases the no of partitions , No shuffling of data it just compress the partitions

    REPARTITION:- is for both increase and decrease the no of partitions , But shuffling takes place

    Example:-

    val rdd = sc.textFile("path",7)
    rdd.repartition(10)
    rdd.repartition(2)
    

    Both works fine

    But we go generally for this two things when we need to see output in one cluster,we go with this.

提交回复
热议问题