问题
I wish to connect to a remote cluster and execute a Spark process. So, from what I have read, this is specified in the SparkConf.
val conf = new SparkConf()
.setAppName("MyAppName")
.setMaster("spark://my_ip:7077")
Where my_ip is the IP address of my cluster. Unfortunately, I get connection refused. So, I am guessing some credentials must be added to connect correctly. How would I specify the credentials? It seems it would be done with .set(key, value), but have no leads on this.
回答1:
There are two things missing:
- The cluster manager should be set to
yarn
(setMaster("yarn")) and the deploy-mode tocluster
, your current setup is used for Spark standalone. More info here: http://spark.apache.org/docs/latest/configuration.html#application-properties - Also, you need to get
yarn-site.xml
andcore-site.xml
files from the cluster and put them inHADOOP_CONF_DIR
, so that Spark can pick up yarn settings, such as the IP of your master node. More info: https://theckang.github.io/2015/12/31/remote-spark-jobs-on-yarn.html
By the way, this would work if you use spark-submit
to submit a job, programatically it's more complex to achieve it and could only use yarn-client
mode which is tricky to setup remotely.
回答2:
- In order to launch application in yarn using spark you should use
--master yarn
for yourspark-submi
t command orsetMaster("yarn")
in app configuration initialization. - If case to send
"spark-submit"
command from remote host can be used popuar Java Secure Channel (JSCH) of course environmental parameter should be set on cluster properly
来源:https://stackoverflow.com/questions/43630494/scala-spark-connect-to-remote-cluster