Running an Airflow DAG every X minutes

与世无争的帅哥 提交于 2020-01-02 03:31:06

问题


I am using airflow on an EC2 instance using the LocalScheduler option. I've invoked airflow scheduler and airflow webserver and everything seems to be running fine. That said, after supplying the cron string to schedule_interval for "do this every 10 minutes," '*/10 * * * *', the job continue to execute every 24 hours by default. Here's the header of the code:

from datetime import datetime
import os
import sys

from airflow.models import DAG
from airflow.operators.python_operator import PythonOperator

import ds_dependencies

SCRIPT_PATH = os.getenv('PREPROC_PATH')

if SCRIPT_PATH:
    sys.path.insert(0, SCRIPT_PATH)
    import workers
else:
    print('Define PREPROC_PATH value in environmental variables')
    sys.exit(1)

default_args = {
  'start_date': datetime(2017, 9, 9, 10, 0, 0, 0), #..EC2 time. Equal to 11pm hora México
  'max_active_runs': 1,
  'concurrency': 4,
  'schedule_interval': '*/10 * * * *' #..every 10 minutes
}

DAG = DAG(
  dag_id='dash_update',
  default_args=default_args
)

...

回答1:


default_args is only meant to fill params passed to operators within a DAG. max_active_runs, concurrency, and schedule_interval are all parameters for initializing your DAG, not operators. This is what you want:

DAG = DAG(
  dag_id='dash_update',
  start_date=datetime(2017, 9, 9, 10, 0, 0, 0), #..EC2 time. Equal to 11pm hora México
  max_active_runs=1,
  concurrency=4,
  schedule_interval='*/10 * * * *', #..every 10 minutes
  default_args=default_args,
)

I've mixed them up before as well, so for reference (note there are overlaps):

DAG parameters: https://airflow.incubator.apache.org/code.html?highlight=dag#airflow.models.DAG Operator parameters: https://airflow.incubator.apache.org/code.html#baseoperator



来源:https://stackoverflow.com/questions/46182852/running-an-airflow-dag-every-x-minutes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!