I\'ve encountered several examples of SparkAction jobs in Oozie, and most of them are in Java. I edit a little and run the example in Cloudera CDH Quickstart 5.4.0 (with Spark v
You should try configure the Oozie Spark action to bring needed files locally. You can make it using a file tag:
${resourceManager}
${nameNode}
local[2]
client
${name}
my_pyspark_job.py
{path to your file on hdfs}/my_pyspark_job.py#my_pyspark_job.py
Explanation: Oozie action running inside YARN container which is allocated by YARN on the node which has available resources. Before running the action (which is actually a "driver" code) it copies all needed files (jars for example) locally to the node into folder allocated for YARN container to put its resources. So by adding tag to oozie action you "telling" your oozie action to bring the my_pyspark_job.py locally to the node of execution.
In my case I want to run a bash script (run-hive-partitioner.bash) which will run a python code (hive-generic-partitioner.py), so I need all files locally accessible on the node:
${jobTracker}
${nameNode}
${appPath}/run-hive-partitioner.bash
${db}
${tables}
${base_working_dir}
${appPath}/run-hive-partitioner.bash#run-hive-partitioner.bash
${appPath}/hive-generic-partitioner.py#hive-generic-partitioner.py
${appPath}/util.py#util.py
where ${appPath} is hdfs://ci-base.com:8020/app/oozie/util/wf-repair_hive_partitions
so this is what I get in my job:
Files in current dir:/hadoop/yarn/local/usercache/hdfs/appcache/application_1440506439954_3906/container_1440506439954_3906_01_000002/
======================
File: hive-generic-partitioner.py
File: util.py
File: run-hive-partitioner.bash
...
File: job.xml
File: json-simple-1.1.jar
File: oozie-sharelib-oozie-4.1.0.2.2.4.2-2.jar
File: launch_container.sh
File: oozie-hadoop-utils-2.6.0.2.2.4.2-2.oozie-4.1.0.2.2.4.2-2.jar
As you can see it oozie (or actually yarn I think) shipped all needed files locally to the temp folder and now it's able to run it.