Hi Guys am new to airflow and python. I need to run the tasks based on the value of a variable in the input json. If the value of the variable \'insurance\' is
The dag-definition-file is continuously parsed by Airflow in background and the generated DAGs & tasks are picked by scheduler. The way your file wires tasks together creates several problems
all 6 tasks (task1
.. task6
) are ALWAYS created (and hence they will always run, irrespective of insurance_flag
); just their inter-task dependency is set in accordance with insurance_flag
the correct way instead is to put both task instantiation (creation of PythonOperator
taskn
object) as well as task wiring within that if .. else
block. That ways, the unnecessary tasks won't be created (and hence won't run)
While the point 1. above alone should be enough to fix your code, may i offer you a suggestion for improvement: having a Variable
being read in dag definition file means a SQL query being fired by Airflow's SQLAlchemy
ORM
very frequently in background (every cycle of continuously parsing dag-definition file)
""" branch 1 """
task1 >> task2 >> task3
""" branch 2 """
task4 >> task5 >> task6
def branch_decider(**kwargs):
my_var_dict = Variable.get('my_var_name', deserialize_json=True)
# decide which branch to take based on insurance flag
if my_var_dict['car']['json']['insurance']:
return 'task1'
else:
return 'task4'
branch_task = BranchPythonOperator(task_id='branch_task',
dag=dag,
python_callable=branch_decider)
Missing mandatory dag
argument from task
instantiations
task1 = BashOperator(
task_id='task1',
bash_command='echo 1',
dag=dag
)
a dagling PythonOperator
with a callable
which json.dump
s Variable that is solving no purpose (unless i misunderstood you code / intent here, remove it completely)
PythonOperator(
task_id='sample_task',
python_callable=sample_fun,
op_kwargs={
json: '{{ dag_run.car.json}}'
},
provide_context=True,
dag=dag
)
def sample_fun(json, **kwargs):
insurance_flag = json.dumps(json)['insurance']
UPDATE-1
Responding to queries raised over comments
We have used Variable.get( my_ var_ name). What is this my_ var_ name
Variables have a key
& value
, my_var_name
is the key
of variable (see the Key
column in following screenshot from Airflow UI)
If condition satisfies return 'task1', 'task2', 'task3' else 'task4', 'task5', 'task6'. Can we add more than 1 tasks in return
No you can't. (you don't have to)
BranchPythonOperator
requires that it's python_callable
should return the task_id
of first task of the branch only
task1
, task2
, task3
, first task's task_id
= task1
task4
, task5
, task6
, first task's task_id
= task4
Furthermore do understand that since the above two sets of tasks have already been wired together, so they will be naturally executed after one-another in that sequence (otherwise what would be the point of wiring them anyways?)
task1 >> task2 >> task3
Check out these links (in addition to links already inlined in answer above)