Airflow starts two DAG runs when turned on for the first time

前端 未结 1 510
被撕碎了的回忆
被撕碎了的回忆 2021-01-06 15:42

When I boot up the Airflow webserver and scheduler for the first time on Oct 25th at around 17:23, and turn on my DAG, I can see that it kicks off two runs for Oct 23rd and

相关标签:
1条回答
  • 2021-01-06 15:53

    I am not sure but this is my best guess -

    In short answer, could be it is how airflow is built and workaround would be to modify your start_date to be yesterday.

    TL;DR

    I agree that kicks off 1 dag for 10-24 when you turned on would sound more natural.

    However, according to your dag runs, RUN 1 is 10-23. This suggests to me that initializing of the first run is not correct and I have looked into the scheduler code.

    And I have a doubt on this line.

    https://github.com/apache/airflow/blob/68b8ec5f415795e4fa4ff7df35a3e75c712a7bad/airflow/jobs/scheduler_job.py#L603

    This is inside a function that create a dag run and setting the start date of the run.

    # The logic is that we move start_date up until
    # one period before, so that timezone.utcnow() is AFTER
    # the period end, and the job can be created...
    now = timezone.utcnow()
    
    # This returns current time + schedule_interval. In your example, this will be tomorrow.
    next_start = dag.following_schedule(now)
    
    # This returns current time - schedule_interval. In your example, this will be yesterday.
    last_start = dag.previous_schedule(now)
    
    # tomorrow <= today should return False 
    if next_start <= now:
        new_start = last_start
    else:
        # and this will return last_start - schedule_interval which means 2 days ago.  
        # wondering if this is intended to be dag.previous_schedule(next_start)???
        new_start = dag.previous_schedule(last_start) 
    
    0 讨论(0)
提交回复
热议问题