I am running Apache Spark on cluster mode using Apache Mesos. But, when I start Spark-Shell to run a simple test command (sc.parallelize(0 to 10, 8).count) I receive the fol
While most of other answers focuses on resource allocation (cores, memory) on spark slaves, I would like to highlight that firewall could cause exactly the same issue, especially when you are running spark on cloud platforms.
If you can find spark slaves in the web UI, you have probably opened the standard ports 8080, 8081, 7077, 4040. Nonetheless, when you actually run a job, it uses SPARK_WORKER_PORT
, spark.driver.port
and spark.blockManager.port
which by default are randomly assigned. If your firewall is blocking these ports, the master could not retrieve any job-specific response from slaves and return the error.
You can run a quick test by opening all the ports and see whether the slave accepts jobs.