问题
Backstory: I was running an Airflow job on a daily schedule, with a start_date
of July 1, 2019. The job gathered requested each day's data from a third party, then loaded that data into our database.
After running the job successfully for several days, I realized that the third party data source only refreshed their data once a month. As such, I was simply downloading the same data every day.
At that point, I changed the start_date
to a year ago (to get previous months' info), and changed the DAG's schedule to run once a month.
How do I (in the airflow UI) restart the DAG completely, such that it recognizes my new start_date
and schedule, and runs a complete backfill as if the DAG is brand new?
(I know this backfill can be requested via the command line. However, I don't have permissions for the command line interface and the admin is unreachable.)
回答1:
Click on the green circle in the Dag Runs column for the job in question in the web interface. This will bring you to a list of all successful runs.
Tick the check mark on the top left in the header of the list to select all instances, then in the menu above it choose "With selected" and then "Delete" in the drop down menu. This should clear all existing dag run instances.
If catchup_by_default is not enabled on your Airflow instance, make sure catchup=True
is set on the DAG until it has finished catching up.
来源:https://stackoverflow.com/questions/56945611/airflow-re-run-dag-from-beginning-with-new-schedule