问题
I'm trying to create a coordinator with a file based dependency. My target is that the coordinator should execute the workflow only if the file specified is created. In case the file was not created, the coordinator should wait until the file is created. I have tried with the following code:
<coordinator-app name="MY_APP" frequency="1440" start="2009-02-01T00:00Z" end="2009-02-07T00:00Z" timezone="UTC" xmlns="uri:oozie:coordinator:0.1">
<datasets>
<dataset name="input1" frequency="60" initial-instance="2009-01-01T00:00Z" timezone="UTC">
<uri-template>hdfs://localhost:9000/tmp/revenue_feed/${YEAR}/${MONTH}/${DAY}/${HOUR}</uri-template>
<done-flag>trigger.dat</done-flag>
</dataset>
</datasets>
<input-events>
<data-in name="coordInput1" dataset="input1">
<start-instance>${coord:current(-23)}</start-instance>
<end-instance>${coord:current(0)}</end-instance>
</data-in>
</input-events>
<action>
<workflow>
<app-path>hdfs://localhost:9000/tmp/workflows</app-path>
</workflow>
</action>
</coordinator-app>
I started the Oozie job and it is in the WAITING state. I have executed the script which will create the file (trigger.dat) in the specified directory structure in HDFS (hdfs://localhost:9000/tmp/revenue_feed/${YEAR}/${MONTH}/${DAY}/${HOUR}). File got created , still the WAITING status.
Could any one help me on this..
回答1:
I have changed the start and end dates and it's working now.
The coordinator.xml working is :
<coordinator-app name="MY_APP" frequency="60" start="2015-01-12T05:00Z" end="2015-01-12T08:00Z" timezone="UTC" xmlns="uri:oozie:coordinator:0.1">
<datasets>
<dataset name="input1" frequency="30" initial-instance="2015-01-12T04:02Z" timezone="UTC">
<uri-template>hdfs://localhost:9000/tmp/revenue_feed/${YEAR}/${MONTH}/${DAY}/${HOUR}</uri-template>
<done-flag>trigger.dat</done-flag>
</dataset>
</datasets>
<input-events>
<data-in name="coordInput1" dataset="input1">
<start-instance>${coord:current(-1)}</start-instance>
<end-instance>${coord:current(0)}</end-instance>
</data-in>
</input-events>
<action>
<workflow>
<app-path>hdfs://localhost:9000/tmp/workflows</app-path>
<configuration>
<property>
<name>property1</name>
<value>${coord:dataIn('coordInput1')}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>
Some points I have observed are :
The directory structure expected is based on initial-instance="2015-01-12T04:02Z" and frequency="30" of dataset we define.
Without declaring below property dataset won't be considered by Oozie
<property> <name>property1</name> <value>${coord:dataIn('coordInput1')}</value> </property>
Oozie always considers GMT/UTC time zone. While scheduling any workflow keep GMT in mind and schedule accordingly.
Till the directory is created the coordinator job will be in RUNNING state, but the workflow job will be in WAITING state.
来源:https://stackoverflow.com/questions/27863577/oozie-file-based-coordinator