问题
The setting catchup_by_default=False in airflow.cfg does not seem to work. Also adding catchup=False to the DAG doesn't work neither.
Here's how to reproduce the issue. I always start from a clean slate by running airflow resetdb
. As soon as I unpause the dag, the tasks start to backfill.
Here's the setup for the dag. I'm just using the tutorial example.
default_args = {
"owner": "airflow",
"depends_on_past": False,
"start_date": datetime(2018, 9, 16),
"email": ["airflow@airflow.com"],
"email_on_failure": False,
"email_on_retry": False,
"retries": 1,
"retry_delay": timedelta(minutes=5),
}
dag = DAG("tutorial", default_args=default_args, schedule_interval=timedelta(1), catchup=False)
回答1:
To be clear if you enabled this DAG that you specified when the time now is 2018-10-22T9:00:00.000EDT (which is what, 2018-10-22T13:00:00.000Z) it would be would be started some time after 2018-10-22T13:00:00.000Z with a run date marked 2018-10-21T00:00:00.000Z.
This is not back filling from the start date, but without any prior run, it does "catchup" the most recent completed valid period; I'm not sure why that's been the case in Airflow for a while, but it's documented that catchup=False
means create a single run of the very most recent valid period.
If the dagrun run date is further confusing to you, please recall that run dates are the execution_date
which is the start of the interval period. The data for the interval is only completely available at the end of the interval period, but Airflow is designed to pass in the start of the period.
Then the next run would start sometime after 2018-10-23T00:00:00.000Z with an execution_date
set as 2018-10-22T00:00:00.000Z.
If, on the 22nd or later, you're getting any run date earlier than the 21st, or multiple runs scheduled, then yes catchup=False
is not working. But there's no other reports of that being the case in v1.10 or v1-10-stable branch.
回答2:
Like @dlamblin mentioned and as mentioned in the docs too Airflow would create a single DagRun for the most recent valid interval. catchup=False
will instruct the scheduler to only create a DAG Run for the most current instance of the DAG interval series.
Although there was a BUG when using a timedelta
for schedule_interval
instead of a CRON expression or CRON preset. This has been fixed in Airflow Master with https://github.com/apache/airflow/pull/8776. We will release Airflow 1.10.11 with this fix.
来源:https://stackoverflow.com/questions/52177418/how-to-stop-dag-from-backfilling-catchup-by-default-false-and-catchup-false-doe