I\'m running spark cluster in standalone mode and application using spark-submit. In spark UI stage section I found executing stage with large execution time ( > 10h, when usual
Armin's answer is very good. I just wanted to point to what worked for me.
The same problem went away when I increased the parameter:
spark.default.parallelism
from 28 (which was the number of executors that I had) to 84 (which is the number of available cores).
NOTE: this is not a rule for setting this parameter, this is only what worked for me.
UPDATE: This approach is also backed by Spark's documentation:
Sometimes, you will get an OutOfMemoryError not because your RDDs don’t fit in memory, but because the working set of one of your tasks, such as one of the reduce tasks in groupByKey, was too large. Spark’s shuffle operations (sortByKey, groupByKey, reduceByKey, join, etc) build a hash table within each task to perform the grouping, which can often be large. The simplest fix here is to increase the level of parallelism, so that each task’s input set is smaller. Spark can efficiently support tasks as short as 200 ms, because it reuses one executor JVM across many tasks and it has a low task launching cost, so you can safely increase the level of parallelism to more than the number of cores in your clusters.