Getting the below error with respect to the container while submitting an spark application to YARN. The HADOOP(2.7.3)/SPARK (2.1) environment is running a pseudo-distribute
You can keep increasing spark.network.timeout
until you stop seeing the problem , as mentioned by himanshuIIITian in comment.
When spark is under heavy workload, timeout exception can occur. If you have low executor memory then GC may keep system very busy which increases workload. Look into the logs if there is Out Of Memory error. Please enable -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
in spark.executor.extraJavaOptions
and look into logs if full GC is invoked a number of times before a task completes. If that is the case then increase your executorMemory
. That should hopefully solve your problem.
for me it is the firewall settings in spark cluster which prevents the executors from connecting correctly, the problem I couldn't figure that promptly as spark UI shows all workers connected to the master, but there are other connections blocked by my firewall. After setting the following ports and allowing them in the firewall problem solved. ( please note that Spark use a random port for these settings by default)
spark.driver.port
spark.blockManager.port