Oozie job won't run if using PySpark in SparkAction

后端未结

关注

 4  2090

陌清茗 2021-02-11 09:21

I\'ve encountered several examples of SparkAction jobs in Oozie, and most of them are in Java. I edit a little and run the example in Cloudera CDH Quickstart 5.4.0 (with Spark v

4条回答

遥遥无期 (楼主)

2021-02-11 09:51

You should try configure the Oozie Spark action to bring needed files locally. You can make it using a file tag:


        ${resourceManager}
        ${nameNode}
        local[2]
        client
        ${name}
        my_pyspark_job.py
        {path to your file on hdfs}/my_pyspark_job.py#my_pyspark_job.py

Explanation: Oozie action running inside YARN container which is allocated by YARN on the node which has available resources. Before running the action (which is actually a "driver" code) it copies all needed files (jars for example) locally to the node into folder allocated for YARN container to put its resources. So by adding tag to oozie action you "telling" your oozie action to bring the my_pyspark_job.py locally to the node of execution.

In my case I want to run a bash script (run-hive-partitioner.bash) which will run a python code (hive-generic-partitioner.py), so I need all files locally accessible on the node:


  
    ${jobTracker}
    ${nameNode}
    ${appPath}/run-hive-partitioner.bash
         ${db}
         ${tables}
         ${base_working_dir}
    ${appPath}/run-hive-partitioner.bash#run-hive-partitioner.bash
    ${appPath}/hive-generic-partitioner.py#hive-generic-partitioner.py
     ${appPath}/util.py#util.py

where ${appPath} is hdfs://ci-base.com:8020/app/oozie/util/wf-repair_hive_partitions

so this is what I get in my job:

Files in current dir:/hadoop/yarn/local/usercache/hdfs/appcache/application_1440506439954_3906/container_1440506439954_3906_01_000002/

======================
File: hive-generic-partitioner.py
File: util.py
File: run-hive-partitioner.bash
...
File: job.xml
File: json-simple-1.1.jar
File: oozie-sharelib-oozie-4.1.0.2.2.4.2-2.jar
File: launch_container.sh
File: oozie-hadoop-utils-2.6.0.2.2.4.2-2.oozie-4.1.0.2.2.4.2-2.jar

As you can see it oozie (or actually yarn I think) shipped all needed files locally to the temp folder and now it's able to run it.

0 讨论(0)

查看其它4个回答