How to stop DAG from backfilling? catchup_by_default=False and catchup=False does not seem to work and Airflow Scheduler from backfilling

前端 未结 3 847
不思量自难忘°
不思量自难忘° 2021-01-14 14:50

The setting catchup_by_default=False in airflow.cfg does not seem to work. Also adding catchup=False to the DAG doesn\'t work neither.

Here\'s how to reproduce the

相关标签:
3条回答
  • 2021-01-14 14:55

    I know this thread is a little old. But, setting catch_up_default = False in airflow.cfg did stop airflow from backfilling for me. (My Airflow version is 1.10.12)

    I resent that this config is not set to False by default. This and the fact that the dag starts one schedule_interval after the start_date are the two most confusing things that stumps Airflow beginners.

    The first time I used Airflow, I wasted one entire afternoon, trying to figure out why my test task which was scheduled to run every 5 mins was running at quick succession (say every 5-6 seconds). It took me a while to realize that it was backfill in action.

    0 讨论(0)
  • 2021-01-14 15:05

    To be clear if you enabled this DAG that you specified when the time now is 2018-10-22T9:00:00.000EDT (which is what, 2018-10-22T13:00:00.000Z) it would be would be started some time after 2018-10-22T13:00:00.000Z with a run date marked 2018-10-21T00:00:00.000Z.

    This is not back filling from the start date, but without any prior run, it does "catchup" the most recent completed valid period; I'm not sure why that's been the case in Airflow for a while, but it's documented that catchup=False means create a single run of the very most recent valid period.

    If the dagrun run date is further confusing to you, please recall that run dates are the execution_date which is the start of the interval period. The data for the interval is only completely available at the end of the interval period, but Airflow is designed to pass in the start of the period.

    Then the next run would start sometime after 2018-10-23T00:00:00.000Z with an execution_date set as 2018-10-22T00:00:00.000Z.

    If, on the 22nd or later, you're getting any run date earlier than the 21st, or multiple runs scheduled, then yes catchup=False is not working. But there's no other reports of that being the case in v1.10 or v1-10-stable branch.

    0 讨论(0)
  • 2021-01-14 15:12

    Like @dlamblin mentioned and as mentioned in the docs too Airflow would create a single DagRun for the most recent valid interval. catchup=False will instruct the scheduler to only create a DAG Run for the most current instance of the DAG interval series.

    Although there was a BUG when using a timedelta for schedule_interval instead of a CRON expression or CRON preset. This has been fixed in Airflow Master with https://github.com/apache/airflow/pull/8776. We will release Airflow 1.10.11 with this fix.

    0 讨论(0)
提交回复
热议问题