Airflow: How to get the return output of one task to set the dependencies of the downstream tasks to run?

老子叫甜甜 提交于 2019-12-08 05:11:30

问题


We have a kubernetes pod operator that will spit out a python dictionary that will define which further downstream kubernetes pod operators to run along with their dependencies and the environment variables to pass into each operator.

How do I get this python dictionary object back into the executor's context (or is it worker's context?) so that airflow can spawn the downstream kubernetes operators?

I've looked at BranchOperator and TriggerDagRunOperator and XCOM push/pull and Variable.get and Variable.set, but nothing seems to quite work.


回答1:


We have a kubernetes pod operator that will spit out a python dictionary that will define which further downstream kubernetes pod operators to run

This is possible, albeit not in the way you are trying. You'll have to have all possible KubernetesPodOperators already in your workflow and then skip those that need not be run.

An elegant way to do this would be to attach a ShortCircuitOperator before each KubernetesPodOperator that reads the XCom (dictionary) published by the upstream KubernetesPodOperator and determines whether or not to continue with the downstream task.

EDIT-1

Actually a cleaner way would be to just raise an AirflowSkipException within the task that you want to skip (rather than using a separate ShortCircuitOperator to do this)


How do I get this python dictionary ... so that airflow can spawn the downstream kubernetes operators..

No. You can't dynamically spawn new tasks based on output of an upstream task.

Think of it this way: for scheduler it is imperative to know all the tasks (their task_ids, trigger_rules, priority_weight etc) ahead of time so as to be able to execute them when the right time comes. If the tasks were to just keep coming up dynamically then Airflow's scheduler would have to become akin to an Operating System scheduler (!). For more details read the EDIT-1 part of this answer



来源:https://stackoverflow.com/questions/55131480/airflow-how-to-get-the-return-output-of-one-task-to-set-the-dependencies-of-the

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!