Apache Airflow - get all parent task_ids

前端 未结 1 1096
佛祖请我去吃肉
佛祖请我去吃肉 2020-12-07 03:11

Suppose a following situation:

[c1, c2, c3] >> child_task

where all c1, c2, c3 and child

1条回答
  •  囚心锁ツ
    2020-12-07 04:06

    The upstream_task_ids and downstream_task_ids properties of BaseOperator are meant just for this purpose.

    from typing import List
    ..
    parent_task_ids: List[str] = my_task.upstream_task_ids
    child_task_ids: List[str] = my_task_downstream_task_ids
    

    Do note however that with this property, you only get immediate (upstream / downstream) neighbour(s) of a task. In order to get all ancestor or descendent tasks, you can quickly cook-up the good old graph theory approach such as this BFS-like implementation

    from typing import List, Set
    from queue import Queue
    from airflow.models import BaseOperator
    
    def get_ancestor_tasks(my_task: BaseOperator) -> List[BaseOperator]:
        ancestor_task_ids: Set[str] = set()
        tasks_queue: Queue = Queue()
        # determine parent tasks to begin BFS
        for task in my_task.upstream_list:
            tasks_queue.put(item=task)
        # perform BFS
        while not tasks_queue.empty():
            task: BaseOperator = tasks_queue.get()
            ancestor_task_ids.add(element=task.task_id)
            for _task in task.upstream_list:
                tasks_queue.put(item=_task)
        # Convert task_ids to actual tasks
        ancestor_tasks: List[BaseOperator] = [task for task in my_task.dag.tasks if task.task_id in ancestor_task_ids]
        return ancestor_tasks
    

    Above snippet is NOT tested, but I'm sure you can take inspiration from it


    References

    • Get all Airflow Leaf Nodes/Tasks
    • Python Queue
    • Python 3 type-annotations

    0 讨论(0)
提交回复
热议问题