Airflow not scheduling Correctly Python

前端 未结 2 1829
闹比i
闹比i 2020-12-28 09:13

Code:

Python version 2.7.x and airflow version 1.5.1

my dag script is this

from airflow import DAG
from air         


        
相关标签:
2条回答
  • 2020-12-28 09:42

    Because the start time(2015-10-13 00:00) less than now time, it triggers the airflow backfill. It will run from 2015-10-13 00:00 when every seconds the airflow scheduler detected(its the Start Date), but Execution Date is between 5 min(task interval time).

    See the log name:

    $tree airflow/logs/testing/
    testing/
    |-- Orders10
    |   |-- 2015-10-13T00:00:00
    |   |-- 2015-10-13T00:05:00
    |   -- 2015-10-13T00:10:00
    |-- Orders11
    |   |-- 2015-10-13T00:00:00
    |   |-- 2015-10-13T00:05:00
    |   -- 2015-10-13T00:10:00
    |-- Orders12
    |   |-- 2015-10-13T00:00:00
    |   |-- 2015-10-13T00:05:00
    |   -- 2015-10-13T00:10:00
    |-- Orders13
    |   |-- 2015-10-13T00:00:00
    |   |-- 2015-10-13T00:05:00
    |   -- 2015-10-13T00:10:00
    |-- Orders14
    |   |-- 2015-10-13T00:00:00
    |   |-- 2015-10-13T00:05:00
    |   -- 2015-10-13T00:10:00
    -- Start1
        |-- 2015-10-13T00:00:00
        |-- 2015-10-13T00:05:00
        |-- 2015-10-13T00:10:00
        -- 2015-10-13T00:15:00
    

    See the create time of logs:

    $ll airflow/logs/testing/Start1
    -rw-rw-r-- 1 admin admin 4192 Nov  9 14:50 2015-10-13T00:00:00
    -rw-rw-r-- 1 admin admin 4192 Nov  9 14:50 2015-10-13T00:05:00
    -rw-rw-r-- 1 admin admin 4192 Nov  9 14:51 2015-10-13T00:10:00
    -rw-rw-r-- 1 admin admin 4192 Nov  9 14:52 2015-10-13T00:15:00
    

    Also, you can see the Task Instances on web UI:

    0 讨论(0)
  • 2020-12-28 09:57

    For Code 2, I guess the reason why it runs every minute is:

    1. The start time is 2015-10-13 00:00

    2. The schedule interval is 5 minutes

    3. Every heartbeat of scheduler(5 seconds by default), your DAG will be checked

      • First check: start date(no last execution date found) + scheduler interval < current time? If yes the DAG will be executed and last execution time will be recorded. (eg. 2015-10-13 00:00 + 5min < current?)
      • Second check on next heartbeat: last execution time + scheduler interval < current time? If so the DAG will be executed again.
      • ....

    The solution is set the DAG start_date as datetime.now() - schedule_interval.

    And also if you want to debug:

    1. Setting the LOGGINGLEVEL to debug in settings.py

    2. Modify class method is_queueable() of airflow.models.TaskInstance to

    :

    def is_queueable(self, flag_upstream_failed=False):
        logging.debug('Checking whether task instance is queueable or not!')
        if self.execution_date > datetime.now() - self.task.schedule_interval:
            logging.debug('Too early to execute: execution_date {0} + task.schedule_interval {1} > datetime.now() {2}'.format(self.execution_date, self.task.schedule_interval, datetime.now()))
            return False
            ...
    
    0 讨论(0)
提交回复
热议问题