问题
Running jobs on a spark 2.3 cluster, I noted in the spark webUI that spill occurs for some tasks :
I understand that on the reduce side, the reducer fetched the needed partitions (shuffle read), then performed the reduce computation using the execution memory of the executor. As there was not enough execution memory some data was spilled.
My questions:
- Am I correct ?
- Where the data is spilled ? Spark webUI states some data is spilled to memory shuffle spilled (memory), but nothing is spilled to disk shuffle spilled (disk)
Thanks in advance for your help
来源:https://stackoverflow.com/questions/51103971/spark-shuffle-spill-metrics