问题
Airflow
documentation clearly states
SubDAGs must have a schedule and be enabled. If the SubDAG’s schedule is set to None or @once, the SubDAG will succeed without having done anything
Although we must stick to the documenation, I've found they work without a hiccup even with schedule_interval
set to None
or @once
. Here's my working example.
My current understanding (I heard about Airflow
only 2 weeks back) of SubDagOperator
s (or subdag
s) is
Airflow
treats asubdag
as just anothertask
- They can cause deadlock but easy workarounds exist
My questions are
- Why does my example work when it shouldn't?
- Why shouldn't my example work (as per the docs) in the first place?
- Any subtle differences between behaviour of
SubDagOperator
and otheroperator
s? - When solutions of known problems exist, why is there so much uproar against SubDagOperators?
I'm using puckel/docker-airflow with
Airflow 1.9.0-4
Python 3.6-slim
CeleryExecutor
withredis:3.2.7
回答1:
If you are just running your DAG once, then you probably won't have any issues with SubDags (as in your example) - especially if you have a bunch of worker slots available. Try letting a few DagRuns of your example accumulate and see if everything runs smoothly if you try to delete and re-run some.
The community has advised moving away from SubDags because unexpected behavior starts happening when you need to re-run old DagRuns or run bigger backfills.
It is not so much that the DAG won't work, but more that unexpected can happen that may affect your workflows that isn't worth the risk when all you are getting in return is a nicer looking DAG.
Even though known solutions exist, implementing them may not be worth the effort.
来源:https://stackoverflow.com/questions/51301763/schedule-interval-and-other-gotchas-with-subdagoperator