问题
Using the Hortonworks HDP 2.3 preview sandbox (oozie:4.2.0.2.3.0.0-2130, spark:1.3 and Hadoop:2.7.1.2.3.0.0-2130), I am trying to invoke the oozie spark action using "yarn-cluster" as the master. The example provided in Oozie Spark Action is for running the spark action on "local" master.
The same page also suggests to be able to run on Yarn, the spark assembly jar should be available to the spark action.
I have two questions
- How do we make the spark assembly jar available to Spark Action? Should I use the jar element in the oozie spark action?
I get the following error when I submit the job without adding the assembly jar explicitly
Using properties file: null Using properties file: null Parsed arguments: master yarn-master deployMode cluster executorMemory 512m executorCores null totalExecutorCores null propertiesFile null extraSparkProperties Map() driverMemory null driverCores null driverExtraClassPath null driverExtraLibraryPath null driverExtraJavaOptions null supervise false queue null numExecutors 3 files null pyFiles null archives null mainClass com.foo.bar.spark.examples.WordCountSparkJob primaryResource hdfs://sandbox.hortonworks.com:8020/apps/foo/sandbox.hortonworks.com/1.201-SNAPSHOT/oozieapp/lib/abc-1.201-SNAPSHOT.jar name Spark Example childArgs [inputpath=hdfs://sandbox.hortonworks.com:8020/tmp/bcp_examples/input/] jars null verbose true Default properties from null: Error: Could not load YARN classes. This copy of Spark may not have been compiled with YARN support. Run with --help for usage help or --verbose for debug output Intercepting System.exit(-1) Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [-1]
Appreciate any pointers on how to solve the problem.
回答1:
The default sharelib distributed with Oozie in HDP2.3 for the spark action is not assembled for YARN.
If you've installed the spark via the hortonworks distro, then you can just replace the contents of the sharelib for the spark action with the installed version.
E.g. (as the oozie user)
hadoop fs -mv /user/oozie/share/lib/spark /user/oozie/share/lib/spark-bak
hadoop fs -mkdir /user/oozie/share/lib/spark
hadoop fs -put /usr/hdp/current/spark-client/lib/* /user/oozie/share/lib/spark
hadoop fs -cp /user/oozie/share/lib/spark-bak/oozie* /user/oozie/share/lib/spark
回答2:
This error is caused by class org.apache.spark.deploy.yarn.Client
can't be loaded. And it contains in spark-assembly jar, which can be find in the /usr/hdp/current/spark-client/lib/
. After you add this file into hdfs://hd-host:port/user/oozie/share/lib/spark
, you have to restart oozie to make it valid immediately.
来源:https://stackoverflow.com/questions/30904316/sparkaction-for-yarn-cluster