How to work correctly airflow schedule_interval

前端 未结 5 972
迷失自我
迷失自我 2021-01-04 13:00

I want to try to use Airflow instead of Cron. But schedule_interval doesn\'t work as I expected.

I wrote the python code like below.
And in my understanding, Air

相关标签:
5条回答
  • 2021-01-04 13:14

    Try this:

    # -*- coding: utf-8 -*-
    from __future__ import absolute_import, unicode_literals
    import os
    from airflow.operators import BashOperator
    from airflow.models import DAG
    from datetime import datetime, timedelta
    
    args = {
        'owner': 'airflow',
        'depends_on_past': False,
        'start_date': datetime(2016, 3, 29),
    }
    
    dag = DAG(
        dag_id='notice_slack',
        default_args=args,
        schedule_interval="15 08 * * *",
        dagrun_timeout=timedelta(minutes=1))
    
    # cmd file name
    CMD = 'bash /tmp/notice_slack.sh'
    
    run_this = BashOperator(
        task_id='run_transport', bash_command=CMD, dag=dag)
    

    start_date (datetime) – The start_date for the task, determines the execution_date for the first task instance. The best practice is to have the start_date rounded to your DAG’s schedule_interval.

    schedule_interval (datetime.timedelta or dateutil.relativedelta.relativedelta or str that acts as a cron expression) – Defines how often that DAG runs, this timedelta object gets added to your latest task instance’s execution_date to figure out the next schedule.

    Simply configuring the schedule_interval and bash_command as the same in your cron setting is okay.

    0 讨论(0)
  • 2021-01-04 13:15

    First, your start date should be in the past - Instead of 'start_date': datetime(2016, 3, 29, 8, 15) Would you try 'start_date': datetime(2016, 2, 29, 8, 15)

    and apply 'catchup':False to prevent backfills - unless this was something you wanted to do.

    From Airflow documentation - The Airflow scheduler triggers the task soon after the start_date + schedule_interval is passed.

    The schedule interval can be supplied as a cron - If you want to run it everyday at 8:15 AM, the expression would be - *'15 8 * * '

    If you want to run it only on Oct 31st at 8:15 AM, the expression would be - *'15 8 31 10 '

    To supply this, 'schedule_inteval':'15 8 * * *' in your Dag property

    You can figure this out more from https://crontab.guru/

    Alternatively, there are Airflow presets -

    If any of these meet your requirements, it would be simply, 'schedule_interval':'@hourly'

    Lastly, you can also apply the schedule as python timedelta object e.g. for 12 PM

    'schedule_interval': timedelta(hours=12)

    0 讨论(0)
  • 2021-01-04 13:21

    you can try using crontab.guru if you are not really sure how to create the airflow cron expression

    0 讨论(0)
  • 2021-01-04 13:30

    With the example you've given @daily will run your job after it passes midnight. You might try changing it either to timedelta(days=1) which is relative to your fixed start_date that includes 08:15. Or you could use a cron spec for the schedule_interval='15 08 * * *' in which case any start date prior to 8:15 on the day BEFORE the day you wanted the first run would work.

    Note that depends_on_past: False is already the default, and you may have confused its behavior with catchup=false in the DAG parameters, which would avoid making past runs for time between the start date and now where the DAG schedule interval would have run.

    0 讨论(0)
  • 2021-01-04 13:35

    Airflow will start your DAG when the 2016/03/30 8:15:00 + schedule interval (daily) is passed. So your DAG will run on 2016/03/31 8:15:00.

    You can check the Airflow FAQ

    0 讨论(0)
提交回复
热议问题