Spark - repartition() vs coalesce()

前端 未结 14 1778
误落风尘
误落风尘 2020-11-22 17:11

According to Learning Spark

Keep in mind that repartitioning your data is a fairly expensive operation. Spark also has an optimized version of

14条回答
  •  隐瞒了意图╮
    2020-11-22 17:42

    But also you should make sure that, the data which is coming coalesce nodes should have highly configured, if you are dealing with huge data. Because all the data will be loaded to those nodes, may lead memory exception. Though reparation is costly, i prefer to use it. Since it shuffles and distribute the data equally.

    Be wise to select between coalesce and repartition.

提交回复
热议问题