“Beeline command not found” error while executing beeline command from python script (called from oozie shell action)

爷,独闯天下 提交于 2020-01-25 09:50:05

问题


I have a python script that I want to schedule using oozie. I am using Oozie shell action for invoking the script. There is a beeline command in the script. When I run the oozie workflow, I get error "sh: beeline: command not found". If I run this script or just the beeline command manually from edge node, it runs perfectly fine. My data platform is Hortonworks 2.6. Below is my workflow.xml and python script:

Workflow.xml

<workflow-app xmlns="uri:oozie:workflow:0.3" name="hive2-wf">
    <credentials>
        <credential name='hcat-creds' type='hcat'>
            <property>
                <name>hcat.metastore.uri</name>
                <value>thrift://host:9083</value>
            </property>
            <property>
                <name>hcat.metastore.principal</name>
                <value>hive/_HOST@SOLON.PRD</value>
            </property>
           </credential>
    </credentials>
    <start to="python-node"/>
    <action name="python-node" cred="hcat-creds">
        <shell xmlns="uri:oozie:shell-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <exec>run_validations.py</exec>
            <argument>--jdbcURL</argument><argument>${jdbcURL}</argument>
            <argument>--jdbcPrincipal</argument><argument>${jdbcPrincipal}</argument> 
                        <env-var>PYTHONPATH=/bin/python</env-var>
            <env-var>PYTHON_EGG_CACHE=/tmp</env-var>
                        <env-var>PATH=/usr/bin</env-var>
            <env-var>HADOOP_CLASSPATH=${HADOOP_CLASSPATH}</env-var>
            <file>run_validations.py</file>
        </shell>
        <ok to="end"/>
        <error to="fail"/>
    </action>
    <kill name="fail">
        <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

Script:

#!/usr/bin/env python2

import sys, os, commands, datetime, time ,getpass, errno
from optparse import OptionParser
import subprocess
from subprocess import Popen, PIPE

def arg_handle():
    usage = "usage: run_validations.py [options]"
    parser = OptionParser(usage)
    parser.add_option("-u", "--jdbcURL",dest="jdbcURL",help="jdbcURL")
    parser.add_option("-p", "--jdbcPrincipal",dest="jdbcPrincipal",help="jdbcPrincipal") 

    (options, args) = parser.parse_args()
    print("run_validations.py  -> Input      : " + str(options))
    return options

def main():
    print("run_validations.py  -> Started Run_validations.py")
    options = arg_handle()

    print("JDBC URL : "+options.jdbcURL)
    print("JDBC PRINCIPAL : "+options.jdbcPrincipal)

    beeline_connection = options.jdbcURL+";principal="+options.jdbcPrincipal
    hive_cmd = 'beeline -u "'+beeline_connection+'" -e "select 1+2;"'
    print("Invoked :"+hive_cmd)
    rc,out =  commands.getstatusoutput(hive_cmd)
    if(rc==0):
        print("RC : "+str(rc))
        print("Output :")
        print(out)
    else:
        print("RC : "+str(rc))
        print("Output :")
        print(out)

if __name__ == "__main__":
    main()

Output

>>> Invoking Shell command line now >>

Stdoutput run_validations.py  -> Started Run_validations.py
Stdoutput run_validations.py  -> Input      : {'jdbcURL': 'jdbc:hive2://host:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2', 'jdbcPrincipal': 'hive/_HOST@SOLON.PRD'}
Stdoutput Invoked :beeline -u "jdbc:hive2://host:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_HOST@SOLON.PRD" -e "select 1+2;"
Stdoutput RC : 32512
Stdoutput Output :
Stdoutput sh: beeline: command not found
Exit code of the Shell command 0
<<< Invocation of Shell command completed <<<

Could someone please tell me what it is that I am missing?


回答1:


Oozie executes shell action in a different node(possibly one of the data nodes) in the Hadoop cluster other than the edge node(where you tested beeline or python script). In the edge node beeline must be installed which is why you are able to test it.

But the actual problem being the node where shell action is being executed does not seem to have beeline installed. You can log in and check for beeline if you got access to that node.

I would suggest you try a combination of hive actions and shell actions to achieve the task you are trying to do.

Refer :Oozie > what is the difference between asynchronous actions and synchronous actions



来源:https://stackoverflow.com/questions/58992680/beeline-command-not-found-error-while-executing-beeline-command-from-python-sc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!