Airflow: Why is there a start_date for operators?

故事扮演 提交于 2019-11-30 20:31:58

Regarding start_date on task instance, personally I have never used this, I always just have a single DAG start_date.

However from what I can see this would allow you to specify certain tasks to start at a different time from the main DAG. It appears this is a legacy feature and from reading the FAQ they recommend using time sensors for that type of thing instead and just having one start_date for all tasks passed through the DAG.

Your second question:

The execution date for a run is always the previous period based on your schedule.

From the docs (Airflow Docs)

Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended.

To clarify:

  • If set on a daily schedule, on the 8th it will execute the 7th.
  • If set to a weekly schedule to run on a Sunday, the execution date for this Sunday would be last Sunday.

Some complex requirements may need specific timings at the task level. For example, I may want my DAG to run each day for a full week before some aggregation logging task starts running, so to achieve this I could set different start dates at the task level.

A bit more useful info... looking through the airflow DAG class source it appears that setting the start_date at the DAG level simply means it is passed through to the task when no default value for task start_date was passed in to the DAG via the default_args dict, or when no specific start_date is are defined on a per task level. So for any case where you want all tasks in a DAG to kick off at the same time (dependencies aside), setting start_date at the DAG level is sufficient.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!