Oozie job won't run if using PySpark in SparkAction

后端未结

关注

 4  2021

闹比i 2021-02-11 09:23

I\'ve encountered several examples of SparkAction jobs in Oozie, and most of them are in Java. I edit a little and run the example in Cloudera CDH Quickstart 5.4.0 (with Spark v

4条回答

被撕碎了的回忆 (楼主)

2021-02-11 09:48

I too struggled a lot with the spark-action in oozie. I setup the sharelib properly and tried to pass the the appropriate jars using the --jars option within the tags, but to no avail.

I always ended up getting some error or the other. The most I could do was run all java/python spark jobs in local mode through the spark-action.

However, I got all my spark jobs running in oozie in all modes of execution using the shell action. The major problem with the shell action is that shell jobs are deployed as the 'yarn' user. If you happen to deploy your oozie spark job from a user account other than yarn, you'll end up with a Permission Denied error (because the user would not be able to access the spark assembly jar copied into /user/yarn/.SparkStaging directory). The way to solve this is to set the HADOOP_USER_NAME environment variable to the user account name through which you deploy your oozie workflow.

Below is a workflow that illustrates this configuration. I deploy my oozie workflows from the ambari-qa user.


    
    
        
            ${jobTracker}
            ${nameNode}
            
                
                    oozie.launcher.mapred.job.queue.name
                    launcher2
                
                
                    mapred.job.queue.name
                    default
                
                
                    oozie.hive.defaults
                    /user/ambari-qa/sparkActionPython/hive-site.xml
                
            
            /usr/hdp/current/spark-client/bin/spark-submit
            --master
            yarn-cluster
            wordcount.py
            HADOOP_USER_NAME=ambari-qa
            /user/ambari-qa/sparkActionPython/wordcount.py#wordcount.py
            
        
        
        
    
    
        Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]

Hope this helps!

0 讨论(0)

查看其它4个回答