问题
We have a kubernetes pod operator that will spit out a python dictionary that will define which further downstream kubernetes pod operators to run along with their dependencies and the environment variables to pass into each operator.
How do I get this python dictionary object back into the executor's context (or is it worker's context?) so that airflow can spawn the downstream kubernetes operators?
I've looked at BranchOperator and TriggerDagRunOperator and XCOM push/pull and Variable.get and Variable.set, but nothing seems to quite work.
回答1:
We have a kubernetes pod operator that will spit out a python dictionary that will define which further downstream kubernetes pod operators to run
This is possible, albeit not in the way you are trying. You'll have to have all possible KubernetesPodOperators already in your workflow and then skip those that need not be run.
An elegant way to do this would be to attach a ShortCircuitOperator before each KubernetesPodOperator
that reads the XCom (dictionary) published by the upstream KubernetesPodOperator
and determines whether or not to continue with the downstream task.
EDIT-1
Actually a cleaner way would be to just raise an AirflowSkipException within the task that you want to skip (rather than using a separate ShortCircuitOperator
to do this)
How do I get this python dictionary ... so that airflow can spawn the downstream kubernetes operators..
No. You can't dynamically spawn new tasks based on output of an upstream task.
Think of it this way: for scheduler
it is imperative to know all the tasks (their task_id
s, trigger_rule
s, priority_weight
etc) ahead of time so as to be able to execute them when the right time comes. If the tasks were to just keep coming up dynamically then Airflow
's scheduler would have to become akin to an Operating System scheduler (!). For more details read the EDIT-1 part of this answer
来源:https://stackoverflow.com/questions/55131480/airflow-how-to-get-the-return-output-of-one-task-to-set-the-dependencies-of-the