Spark: shuffle operation leading to long GC pause

前端 未结 1 1164
[愿得一人]
[愿得一人] 2021-02-10 16:32

I\'m running Spark 2 and am trying to shuffle around 5 terabytes of json. I\'m running into very long garbage collection pauses during shuffling of a Dataset<

相关标签:
1条回答
  • 2021-02-10 17:23

    Adding the following flags got rid of the GC pauses.

    spark.executor.extraJavaOptions -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35 -XX:ConcGCThreads=12
    

    I think it does take a fair amount of tweaking though. This databricks post was very very helpful.

    0 讨论(0)
提交回复
热议问题