Spark standalone connection driver to worker

北慕城南 提交于 2019-11-29 12:50:55

TL;DR Make sure that spark.driver.host:spark.driver.port can be accessed from each node in the cluster.

In general you have ensure that all nodes (both executors and master) can reach the driver.

  • In the cluster mode, where driver runs on one of the executors this is satisfied by default, as long as no ports are closed for the connections (see below).
  • In client mode machine, on which driver has been started, has to be accessible from the cluster. It means that spark.driver.host has to resolve to a publicly reachable address.

In both cases you have to keep in mind, that by default driver runs on a random port. It is possible to use a fixed one by setting spark.driver.port. Obviously this doesn't work that well, if you want to submit multiple applications at the same time.

Furthermore:

when when the file is only present on worker

won't work. All inputs have to be accessible from driver, as well as, from each executor node.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!