问题
I am new to Airflow.
I have come across a scenario, where Parent DAG need to pass some dynamic number (let's say n
) to Sub DAG.
Where as SubDAG will use this number to dynamically create n
parallel tasks.
Airflow documentation doesn't cover a way to achieve this. So I have explore couple of ways :
Option - 1(Using xcom Pull)
I have tried to pass as a xcom value, but for some reason SubDAG is not resolving to the passed value.
Parent Dag File
def load_dag(**kwargs):
number_of_runs = json.dumps(kwargs['dag_run'].conf['number_of_runs'])
dag_data = json.dumps({
"number_of_runs": number_of_runs
})
return dag_data
# ------------------ Tasks ------------------------------
load_config = PythonOperator(
task_id='load_config',
provide_context=True,
python_callable=load_dag,
dag=dag)
t1 = SubDagOperator(
task_id=CHILD_DAG_NAME,
subdag=sub_dag(PARENT_DAG_NAME, CHILD_DAG_NAME, default_args, "'{{ ti.xcom_pull(task_ids='load_config') }}'" ),
default_args=default_args,
dag=dag,
)
Sub Dag File
def sub_dag(parent_dag_name, child_dag_name, args, num_of_runs):
dag_subdag = DAG(
dag_id='%s.%s' % (parent_dag_name, child_dag_name),
default_args=args,
schedule_interval=None)
variabe_names = {}
for i in range(num_of_runs):
variabe_names['task' + str(i + 1)] = DummyOperator(
task_id='dummy_task',
dag=dag_subdag,
)
return dag_subdag
Option - 2
I have also tried to pass number_of_runs
as a global variable, which was not working.
Option - 3
Also we tried to write this value to a data file. But sub DAG is throwing File doesn't exist error
. This might be because we are dynamically generating this file.
Can some one help me with this.
回答1:
I've done it with Option 3. The key is to return a valid dag with no tasks, if the file does not exist. So load_config will generate a file with your number of tasks or more information if needed. Your subdag factory would look something like:
def subdag(...):
sdag = DAG('%s.%s' % (parent, child), default_args=args, schedule_interval=timedelta(hours=1))
file_path = "/path/to/generated/file"
if os.path.exists(file_path):
data_file = open(file_path)
list_tasks = data_file.readlines()
for task in list_tasks:
DummyOperator(
task_id='task_'+task,
default_args=args,
dag=sdag,
)
return sdag
At dag generation you will see a subdag with No tasks. At dag execution, after load_config is done, you can see you dynamically generated subdag
回答2:
Option 1 should work if you just change the call to xcom_pull
to include the dag_id
of the parent dag. By default the xcom_pull
call will look for the task_id
'load_config'
in its own dag which doesnt exist.
so change the x_com call macro to:
subdag=sub_dag(PARENT_DAG_NAME, CHILD_DAG_NAME, default_args, "'{{ ti.xcom_pull(task_ids='load_config', dag_id='" + PARENT_DAG_NAME + "' }}'" ),
回答3:
If the filename you are writing to is not dynamic (e.g. you are writing over the same file over and over again for each task instance), Jaime's answer will work:
file_path = "/path/to/generated/file"
But if you need a unique filename or want different content written to the file by each task instance for tasks executed in parallel, airflow will not work for this case, since there is no way to pass the execution date or variable outside of a template. Take a look at this post.
回答4:
Take a look at my answer here, in which I describe a way to create a task dynamically based on the results of a previously executed task using xcoms and subdags.
来源:https://stackoverflow.com/questions/44365716/airflow-passing-a-dynamic-value-to-sub-dag-operator