I\'ve encountered several examples of SparkAction jobs in Oozie, and most of them are in Java. I edit a little and run the example in Cloudera CDH Quickstart 5.4.0 (with Spark v
I too struggled a lot with the spark-action in oozie. I setup the sharelib properly and tried to pass the the appropriate jars using the --jars option within the
tags, but to no avail.
I always ended up getting some error or the other. The most I could do was run all java/python spark jobs in local mode through the spark-action.
However, I got all my spark jobs running in oozie in all modes of execution using the shell action. The major problem with the shell action is that shell jobs are deployed as the 'yarn' user. If you happen to deploy your oozie spark job from a user account other than yarn, you'll end up with a Permission Denied error (because the user would not be able to access the spark assembly jar copied into /user/yarn/.SparkStaging directory). The way to solve this is to set the HADOOP_USER_NAME environment variable to the user account name through which you deploy your oozie workflow.
Below is a workflow that illustrates this configuration. I deploy my oozie workflows from the ambari-qa user.
${jobTracker}
${nameNode}
oozie.launcher.mapred.job.queue.name
launcher2
mapred.job.queue.name
default
oozie.hive.defaults
/user/ambari-qa/sparkActionPython/hive-site.xml
/usr/hdp/current/spark-client/bin/spark-submit
--master
yarn-cluster
wordcount.py
HADOOP_USER_NAME=ambari-qa
/user/ambari-qa/sparkActionPython/wordcount.py#wordcount.py
Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
Hope this helps!