Oozie job won't run if using PySpark in SparkAction

后端 未结 4 2012
闹比i
闹比i 2021-02-11 09:23

I\'ve encountered several examples of SparkAction jobs in Oozie, and most of them are in Java. I edit a little and run the example in Cloudera CDH Quickstart 5.4.0 (with Spark v

4条回答
  •  被撕碎了的回忆
    2021-02-11 09:48

    I too struggled a lot with the spark-action in oozie. I setup the sharelib properly and tried to pass the the appropriate jars using the --jars option within the tags, but to no avail.

    I always ended up getting some error or the other. The most I could do was run all java/python spark jobs in local mode through the spark-action.

    However, I got all my spark jobs running in oozie in all modes of execution using the shell action. The major problem with the shell action is that shell jobs are deployed as the 'yarn' user. If you happen to deploy your oozie spark job from a user account other than yarn, you'll end up with a Permission Denied error (because the user would not be able to access the spark assembly jar copied into /user/yarn/.SparkStaging directory). The way to solve this is to set the HADOOP_USER_NAME environment variable to the user account name through which you deploy your oozie workflow.

    Below is a workflow that illustrates this configuration. I deploy my oozie workflows from the ambari-qa user.

    
        
        
            
                ${jobTracker}
                ${nameNode}
                
                    
                        oozie.launcher.mapred.job.queue.name
                        launcher2
                    
                    
                        mapred.job.queue.name
                        default
                    
                    
                        oozie.hive.defaults
                        /user/ambari-qa/sparkActionPython/hive-site.xml
                    
                
                /usr/hdp/current/spark-client/bin/spark-submit
                --master
                yarn-cluster
                wordcount.py
                HADOOP_USER_NAME=ambari-qa
                /user/ambari-qa/sparkActionPython/wordcount.py#wordcount.py
                
            
            
            
        
        
            Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
        
        
    
    

    Hope this helps!

提交回复
热议问题