问题
I want to clear the tasks in DAG B when DAG A completes execution. Both A and B are scheduled DAGs.
Is there any operator
/way to clear the state of tasks and re-run DAG B programmatically?
I'm aware of the CLI option and Web UI option to clear the tasks.
回答1:
- cli.py is an incredibly useful place to peep into
SQLAlchemy
magic ofAirflow
. - The clear command is implemented here
@cli_utils.action_logging def clear(args): logging.basicConfig( level=settings.LOGGING_LEVEL, format=settings.SIMPLE_LOG_FORMAT) dags = get_dags(args) if args.task_regex: for idx, dag in enumerate(dags): dags[idx] = dag.sub_dag( task_regex=args.task_regex, include_downstream=args.downstream, include_upstream=args.upstream) DAG.clear_dags( dags, start_date=args.start_date, end_date=args.end_date, only_failed=args.only_failed, only_running=args.only_running, confirm_prompt=not args.no_confirm, include_subdags=not args.exclude_subdags, include_parentdag=not args.exclude_parentdag, )
- Looking at the source, you can either
- replicate it (assuming you also want to modify the functionality a bit)
- or maybe just do
from airflow.bin import cli
and invoke the required functions directly
回答2:
Since my objective was to re-run the DAG B whenever DAG A completes execution, i ended up clearing the DAG B using BashOperator:
# Clear the tasks in another dag
last_task = BashOperator(
task_id='last_task',
bash_command= 'airflow clear example_target_dag -c ',
dag=dag)
first_task >> last_task
回答3:
I would recommend staying away from CLI here!
The airflow functionality of dags/tasks are much better exposed when referencing the objects, as compared to going through BashOperator and/or CLI module.
Add a python operation to dag A named "clear_dag_b", that imports dag_b from the dags folder(module) and this:
from dags.dag_b import dag as dag_b
def clear_dag_b(**context):
exec_date = context[some date object, I forget the name]
dag_b.clear(start_date=exec_date, end_date=exec_date)
Important! If you for some reason do not match or overlap the dag_b schedule time with start_date/end_date, the clear() operation will miss the dag executions. This example assumes dag A and B are scheduled identical, and that you only want to clear day X from B, when A executes day X
It might make sense to include a check for whether the dag_b has already run or not, before clearing:
dab_b_run = dag_b.get_dagrun(exec_date) # returns None or a dag_run object
来源:https://stackoverflow.com/questions/58180281/programmatically-clear-the-state-of-airflow-task-instances