Spark: shuffle operation leading to long GC pause

前端 未结 1 788
自闭症患者
自闭症患者 2021-02-10 16:35

I\'m running Spark 2 and am trying to shuffle around 5 terabytes of json. I\'m running into very long garbage collection pauses during shuffling of a Dataset<

1条回答
  •  鱼传尺愫
    2021-02-10 17:10

    Adding the following flags got rid of the GC pauses.

    spark.executor.extraJavaOptions -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35 -XX:ConcGCThreads=12
    

    I think it does take a fair amount of tweaking though. This databricks post was very very helpful.

    0 讨论(0)
提交回复
热议问题