Forcing driver to run on specific slave in spark standalone cluster running with “--deploy-mode cluster”

柔情痞子 提交于 2020-01-04 04:02:29

问题


I am running a small spark cluster, with two EC2 instances (m4.xlarge).

So far I have been running the spark master on one node, and a single spark slave (4 cores, 16g memory) on the other, then deploying my spark (streaming) app in client deploy-mode on the master. Summary of settings is:

--executor-memory 16g

--executor-cores 4

--driver-memory 8g

--driver-cores 2

--deploy-mode client

This results in a single executor on my single slave running with 4 cores and 16Gb memory. The driver runs "outside" of the cluster on the master-node (i.e. it is not allocated its resources by the master).

Ideally I'd like to use cluster deploy-mode so that I can take advantage of the supervise option. I have started a second slave on the master node giving it 2 cores and 8g memory (smaller allocated resources so as to leave space for the master daemon).

When I run my spark job in cluster deploy-mode (using the same settings as above but with --deploy-mode cluster). Around 50% of the time I get the desired deployment which is that the driver runs through the slave running on the master node (which has the right resources of 2 cores & 8Gb) which leaves the original slave node free to allocate an executor of 4 cores & 16Gb. However the other 50% of the time the master runs the driver on the non-master slave node, which means I get an driver on that node with 2 cores & 8Gb memory, which then leaves no node with sufficient resources to start an executor (which requires 4 cores & 16Gb).

Is there any way to force the spark master to use a specific worker / slave for my driver? Given spark knows that there are two slave nodes, one with 2 cores and the other with 4 cores, and that my driver needs 2 cores, and my executor needs 4 cores it would ideally work out the right optimal placement, but this doesn't seem to be the case.

Any ideas / suggestions gratefully received!

Thanks!


回答1:


I can see that this is an old question, but let me answer it still, someone might find it useful.

Add --driver-java-options="-Dspark.driver.host=<HOST>" option to spark-submit script, when submitting application, and Spark should deploy driver to specified host.



来源:https://stackoverflow.com/questions/40526723/forcing-driver-to-run-on-specific-slave-in-spark-standalone-cluster-running-with

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!