I\'m trying to access at subdag creation time some xcom data from parent dag, I was searching to achieve this on internet but I didn\'t find something.
def t
The error is simple: you are missing the context
argument required by xcom_pull() method. But you really can't just create context
to pass into this method; it is a Python
dictionary that Airflow
passes to anchor methods like pre_execute() and execute() of BaseOperator
(parent class of all Operator
s).
In other words, context
becomes available only when Operator
is actually executed, not during DAG
-definition. And it makes sense because in taxanomy of Airflow
, xcom
s are communication mechanism between task
s in realtime: talking to each other while they are running.
But at the end of the day Xcom
s, just like every other Airflow
model, are persisted in backend meta-db. So of course you can directly retrieve it from there (obviously only the XCOMs of task
s that had run in the past). While I don't have a code-snippet, you can have a look at cli.py where they've used the SQLAlchemy
ORM to play with models and backend-db. Do understand that this would mean a query being fired to your backend-db every time the DAG
-definition file is parsed, which happens rather quickly.
Useful links
EDIT-1
After looking at your code-snippet, I got alarmed. Assuming the value returned by xcom_pull()
will keep changing frequently, the number of task
s in your dag
will also keep changing. This can lead to unpredictable behaviours (you should do a fair bit of research but I don't have a good feeling about it)
I'd suggest you revisit your entire task workflow and condense down to a design where the
- number of task
s and
- structure of DAG
are known ahead of time (at the time of execution of dag-definition file). You can of-course iterate over a json
file / result of a SQL
query (like the SQLAlchemy
thing mentioned earlier) etc. to spawn your actual task
s, but that file / db / whatever shouldn't be changing frequently.
Do understand that merely iterating over a list to generate task
s is not problematic; what's NOT possible is to have structure of your DAG
dependent on result of upstream
task
. For example you can't have n task
s created in your DAG
based on an upstream task calculating value of n at runtime.
So this is not possible
But this is possible (including what you are trying to achieve; even though the way you are doing it doesn't seem like a good idea)
EDIT-2
So as it turns out, generating tasks from output of upstream tasks is possible after all; although it requires significant amount of knowledge of internal workings of Airflow as well as a tinge of creativity.