I am trying to submit a Python job to a 2 worker node Spark cluster but am persistently seeing the following problem, which eventually causes spark-submit to fail:
So this happens when the python worker process fails to connect to the spark executor JVM. Spark uses sockets to communicate with the worker process. There are a large number of reasons why this could happen, and the exact specific details will likely be in the logs on the executor/worker machines.