Spark on yarn jar upload problems

前端未结

关注

 2  1018

I am trying to run a simple Map/Reduce java program using spark over yarn (Cloudera Hadoop 5.2 on CentOS). I have tried this 2 different ways. The first way is the following

相关标签:

2条回答

我在风中等你

2021-01-13 18:13

The problem was solved by copying spark-assembly.jar into a directory on the hdfs for each node and then passing it to spark-submit --conf spark.yarn.jar as a parameter. Commands are listed below:

hdfs dfs -copyFromLocal /var/tmp/spark/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop2.4.0.jar /user/spark/spark-assembly.jar 

/var/tmp/spark/spark-1.4.0-bin-hadoop2.4/bin/spark-submit --class MRContainer --master yarn-cluster  --conf spark.yarn.jar=hdfs:///user/spark/spark-assembly.jar simplemr.jar

0 讨论(0)

不知归路

2021-01-13 18:24

if you are getting this error it means you are uploading assembly jars using --jars option or manually copying to hdfs in each node. i have followed this approach and it works for me .

In yarn-cluster mode, Spark submit automatically uploads the assembly jar to a distributed cache that all executor containers read from, so there is no need to manually copy the assembly jar to all nodes (or pass it through --jars). It seems there are two versions of the same jar in your HDFS.

Try removing all old jars from your .sparkStaging directory and try again ,it should work .

0 讨论(0)
发布评论:

提交评论
- 加载中...