问题
I am working on automating Pig jobs using oozie in hadoop cluster.
I was able to run a sample pig script from oozie but my next requirement is to run a pig job where the pig script recieves it's input parameters from a shell script. Please share your thoughts
回答1:
UPDATE:
OK make the original question clear, how can you pass a parameter form a shell script output. Here's the working example:
WORKFLOW.XML
<workflow-app xmlns='uri:oozie:workflow:0.3' name='shell-wf'>
<start to='shell1' />
<action name='shell1'>
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>so.sh</exec>
<argument>A</argument>
<argument>B</argument>
<file>so.sh</file>
<capture-output/>
</shell>
<ok to="shell2" />
<error to="fail" />
</action>
<action name='shell2'>
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>so2.sh</exec>
<argument>${wf:actionData('shell1')['out']}</argument>
<file>so2.sh</file>
</shell>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
</workflow-app>
SO.SH
echo "out=test"
SO2.SH
echo "I'm so2.sh and I get the following param:"
echo $1
If you replace the 2nd shell action with your pig action and pass the param to the pig script like this:
...
<param>MY_PARAM=${wf:actionData('shell1')['out']}</param>
...
Than your original question is solved.
Regarding your sharelib issue, you have to be sure that in the properties you configured the LIB_PATH=where/you/jars/are and hand over this param to the pig action,
<param>LIB_PATH=${LIB_PATH}</param>
than just register the jars from there:
REGISTER '$LIB_PATH/my_jar'
========================================================================== What you are looking for is the
Map wf:actionData(String node)
This function is only applicable to action nodes that produce output data on completion.
The output data is in a Java Properties format and via this EL function it is available as a Map .
Documentation Here's a nice example: http://www.infoq.com/articles/oozieexample (actually you have to capture the output as Samson wrote in the comments)
Some more details: "If the capture-output element is present, it indicates Oozie to capture output of the STDOUT of the shell command execution. The Shell command output must be in Java Properties file format and it must not exceed 2KB. From within the workflow definition, the output of an Shell action node is accessible via the String action:output(String node, String key) function (Refer to section '4.2.6 Action EL Functions')."
Or you can use a not so nice but simple work-a-round and execute your shell script in the pig itself and save it's result in a variable, and using that. Like this:
%DEFINE MY_VAR `echo "/abc/cba'`
A = LOAD '$MY_VAR' ...
But this is not nice at all, the first solution is the suggested.
来源:https://stackoverflow.com/questions/34943757/submit-pig-job-from-oozie