Programmatically clear the state of airflow task instances

懵懂的女人 提交于 2019-12-24 00:59:32

问题


I want to clear the tasks in DAG B when DAG A completes execution. Both A and B are scheduled DAGs.

Is there any operator/way to clear the state of tasks and re-run DAG B programmatically?


I'm aware of the CLI option and Web UI option to clear the tasks.


回答1:


  • cli.py is an incredibly useful place to peep into SQLAlchemy magic of Airflow.
  • The clear command is implemented here
@cli_utils.action_logging
def clear(args):
    logging.basicConfig(
        level=settings.LOGGING_LEVEL,
        format=settings.SIMPLE_LOG_FORMAT)
    dags = get_dags(args)

    if args.task_regex:
        for idx, dag in enumerate(dags):
            dags[idx] = dag.sub_dag(
                task_regex=args.task_regex,
                include_downstream=args.downstream,
                include_upstream=args.upstream)

    DAG.clear_dags(
        dags,
        start_date=args.start_date,
        end_date=args.end_date,
        only_failed=args.only_failed,
        only_running=args.only_running,
        confirm_prompt=not args.no_confirm,
        include_subdags=not args.exclude_subdags,
        include_parentdag=not args.exclude_parentdag,
    )
  • Looking at the source, you can either
    • replicate it (assuming you also want to modify the functionality a bit)
    • or maybe just do from airflow.bin import cli and invoke the required functions directly



回答2:


Since my objective was to re-run the DAG B whenever DAG A completes execution, i ended up clearing the DAG B using BashOperator:

# Clear the tasks in another dag
last_task = BashOperator(
    task_id='last_task',
    bash_command= 'airflow clear example_target_dag -c ',
    dag=dag)

first_task >> last_task



回答3:


I would recommend staying away from CLI here!

The airflow functionality of dags/tasks are much better exposed when referencing the objects, as compared to going through BashOperator and/or CLI module.

Add a python operation to dag A named "clear_dag_b", that imports dag_b from the dags folder(module) and this:

from dags.dag_b import dag as dag_b

def clear_dag_b(**context):
   exec_date = context[some date object, I forget the name]
   dag_b.clear(start_date=exec_date, end_date=exec_date) 

Important! If you for some reason do not match or overlap the dag_b schedule time with start_date/end_date, the clear() operation will miss the dag executions. This example assumes dag A and B are scheduled identical, and that you only want to clear day X from B, when A executes day X

It might make sense to include a check for whether the dag_b has already run or not, before clearing:

dab_b_run = dag_b.get_dagrun(exec_date) # returns None or a dag_run object


来源:https://stackoverflow.com/questions/58180281/programmatically-clear-the-state-of-airflow-task-instances

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!