问题
For some reason, Airflow doesn't seem to trigger the latest run for a dag with a weekly schedule interval.
Current Date:
$ date
$ Tue Aug 9 17:09:55 UTC 2016
DAG:
from datetime import datetime
from datetime import timedelta
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
dag = DAG(
dag_id='superdag',
start_date=datetime(2016, 7, 18),
schedule_interval=timedelta(days=7),
default_args={
'owner': 'Jon Doe',
'depends_on_past': False
}
)
BashOperator(
task_id='print_date',
bash_command='date',
dag=dag
)
Run scheduler
$ airflow scheduler -d superdag
You'd expect a total of four DAG Runs as the scheduler should backfill for 7/18, 7/25, 8/1, and 8/8. However, the last run is not scheduled.
EDIT 1:
I understand that Vineet although that doesn’t seem to explain my issue.
In my example above, the DAG’s start date is July 18.
- First DAG Run: July 18
- Second DAG Run: July 25
- Third DAG Run: Aug 1
- Fourth DAG Run: Aug 8 (not run)
Where each DAG Run processes data from the previous week.
Today being Aug 9, I would expect the Fourth DAG Run to have executed with a execution date of Aug 8 which processes data for the last week (Aug 1 until Aug 8) but it doesn’t.
回答1:
Airflow always schedules for the previous period. So if you have a dag that is scheduled to run daily, on Aug 9th, it will schedule a run with execution_date Aug 8th. Similarly if the schedule interval is weekly, then on Aug 9th, it will schedule for 1 week back i.e. Aug 2nd, though this gets run on Aug 9th itself. This is just airflow bookkeeping. You can find this in the airflow wiki (https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls):
Understanding the execution date Airflow was developed as a solution for ETL needs. In the ETL world, you typically summarize data. So, if I want to summarize data for 2016-02-19, I would do it at 2016-02-20 midnight GMT, which would be right after all data for 2016-02-19 becomes available. This date is available to you in both Jinja and a Python callable's context in many forms as documented here. As a note ds refers to date_string, not date start as may be confusing to some.
回答2:
The similar issue happened to me as well.
I solved it by manually run
airflow backfill -s start_date -e end_date DAG_NAME
where start_date and end_date covers the missing execution_date, in your case, 2016-08-08.
For example,
airflow backfill -s 2016-08-07 -e 2016-08-09 DAG_NAME
来源:https://stackoverflow.com/questions/38856886/airflow-does-not-backfill-latest-run