Is it possible for Airflow scheduler to first finish the previous day's cycle before starting the next?

前端 未结 3 862
暗喜
暗喜 2021-02-02 16:50

Right now, nodes in my DAG proceeds to the next day\'s task before the rest of the nodes of that DAG finishes. Is there a way for it to wait for the rest of the DAG to finish be

相关标签:
3条回答
  • 2021-02-02 17:22

    Might be a bit late for this answer, but I ran into the same issue and the way I resolved it is I added two extra tasks in each dag. "Previous" at the start and "Complete" at the end. Previous task is external task sensor which monitors previous job. Complete is just a dummy operator. Lets say it runs every 30 minutes so the dag would look like this:

    dag = DAG(dag_id='TEST_DAG', default_args=default_args, schedule_interval=timedelta(minutes=30))
    
    PREVIOUS = ExternalTaskSensor(
        task_id='Previous_Run',
        external_dag_id='TEST_DAG',
        external_task_id='All_Tasks_Completed',
        allowed_states=['success'],
        execution_delta=timedelta(minutes=30),
        dag=DAG
    )
    
    T1 = BashOperator(
        task_id='TASK_01',
        bash_command='echo "Hello World from Task 1"',
        dag=dag
    )
    
    COMPLETE = DummyOperator(
        task_id='All_Tasks_Completed',
        dag=DAG
    )
    
    PREVIOUS >> T1 >> COMPLETE
    

    So the next dag, even tho it will come into the queue, it will not let tasks run until PREVIOUS is completed.

    0 讨论(0)
  • 2021-02-02 17:22

    What ended up working for me is a combination of

    1. Adding task dependencies : wait_for_downstream=True, depends_on_past=True
    2. Adding max_active_runs:1 to while creating the dag. I did try to add max_active_runs as a default argument, but that did not work.
    0 讨论(0)
  • 2021-02-02 17:23

    if you want to just run one instance at a time then try setting max_active_runs=1

    0 讨论(0)
提交回复
热议问题