I\'m trying to run spark 1.5 on mesos in cluster mode. I\'m able to launch the dispatcher and to run the spark-submit. But when I do so, the spark driver fails with the followin
I was getting similar issues and used some trial-and-error to find the cause and solution. I might not be able to give the 'real' reason but trying it out the below way can help you resolve it.
Try launching spark-shell with memory and core parameters:
spark-shell
--driver-memory=2g
--executor-memory=7g
--num-executors=8
--executor-cores=4
--conf "spark.storage.memoryFraction=1" // important
--conf "spark.akka.frameSize=200" // keep it sufficiently high, maybe higher than 100 is a good thing
--conf "spark.default.parallelism=100"
--conf "spark.core.connection.ack.wait.timeout=600"
--conf "spark.yarn.executor.memoryOverhead=2048" // (in mb) not really valid for shell, but good thing for spark-submit
--conf "spark.yarn.driver.memoryOverhead=400" // not really valid for shell, but good thing for spark-submit. minimum 384 (in mb)
Now, if total memory (driver memory + num executors * executor memory) goes beyond available memory, it's going to throw error. I believe that's not the case for you.
Executor cores, keep it small, say, 2 or 4.
executor memory = (total memory - driver memory)/ number of executors .. actually a little less.
Next is to run the code in the spark-shell prompt and check how much memory is getting utilized in the Executors tab.
What I understood (empirical though) is that, following type of problems can occur:
I hope this helps you to get the right configuration. Once that's set, you can use the same configurations during submitting a spark-submit job.
Note: I got a cluster with a lot of resource constraints and multiple users using it in ad-hoc ways .. making resources uncertain and therefore calculations have to be in the 'safer' limit. This resulted in a lot of iterative experiments.