Scala Spark connect to remote cluster

前提是你 提交于 2020-06-14 06:32:50

问题


I wish to connect to a remote cluster and execute a Spark process. So, from what I have read, this is specified in the SparkConf.

 val conf = new SparkConf()
  .setAppName("MyAppName")
  .setMaster("spark://my_ip:7077")

Where my_ip is the IP address of my cluster. Unfortunately, I get connection refused. So, I am guessing some credentials must be added to connect correctly. How would I specify the credentials? It seems it would be done with .set(key, value), but have no leads on this.


回答1:


There are two things missing:

  • The cluster manager should be set to yarn (setMaster("yarn")) and the deploy-mode to cluster, your current setup is used for Spark standalone. More info here: http://spark.apache.org/docs/latest/configuration.html#application-properties
  • Also, you need to get yarn-site.xml and core-site.xml files from the cluster and put them in HADOOP_CONF_DIR, so that Spark can pick up yarn settings, such as the IP of your master node. More info: https://theckang.github.io/2015/12/31/remote-spark-jobs-on-yarn.html

By the way, this would work if you use spark-submit to submit a job, programatically it's more complex to achieve it and could only use yarn-client mode which is tricky to setup remotely.




回答2:


  1. In order to launch application in yarn using spark you should use --master yarn for your spark-submit command or setMaster("yarn") in app configuration initialization.
  2. If case to send "spark-submit" command from remote host can be used popuar Java Secure Channel (JSCH) of course environmental parameter should be set on cluster properly


来源:https://stackoverflow.com/questions/43630494/scala-spark-connect-to-remote-cluster

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!