airflow-scheduler

Airflow 1.10 - Scheduler Startup Fails

偶尔善良 提交于 2019-12-01 20:18:59
I've just painfully installed Airflow 1.10 thanks to my previous post here . We have a single ec2-instance running, our queue is AWS Elastic Cache Redis, and our meta database is AWS RDS for PostgreSQL. Airflow works with this setup just fine when we are on Airflow version 1.9. But we are encountering an issue on Airflow version 1.10 when we go to start up the scheduler. [2018-08-15 16:29:14,015] {jobs.py:385} INFO - Started process (PID=15778) to work on /home/ec2-user/airflow/dags/myDag.py [2018-08-15 16:29:14,055] {jobs.py:1782} INFO - Processing file /home/ec2-user/airflow/dags/myDag.py

Scheduling dag runs in Airflow

↘锁芯ラ 提交于 2019-12-01 18:41:22
Got a general query on Airflow Is it possible to have a dag file scheduled based on another dag file's schedule. For example, if I have 2 dags namely dag1 and dag2. I am trying to see if I can have dag2 run each time dag1 is successful else dag2 does not run. Is this possible in Airflow. You will want to add a TriggerDagRunOperator the end of dag1 and set the schedule of dag2 to None . In addition, if you want to handle multiple cases for the output of dag1 , you can add in a BranchPythonOperator to create multiple paths based on its output. For example, you could set it to either execute the

Airflow External sensor gets stuck at poking

三世轮回 提交于 2019-12-01 16:20:48
I want one dag starts after completion of another dag. one solution is using external sensor function, below you can find my solution. the problem I encounter is that the dependent dag is stuck at poking, I checked this answer and made sure that both of the dags runs on the same schedule, my simplified code is as follows: any help would be appreciated. leader dag: from airflow import DAG from airflow.operators.bash_operator import BashOperator from datetime import datetime, timedelta default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime(2015, 6, 1), 'retries': 1

Airflow External sensor gets stuck at poking

为君一笑 提交于 2019-12-01 15:13:46
问题 I want one dag starts after completion of another dag. one solution is using external sensor function, below you can find my solution. the problem I encounter is that the dependent dag is stuck at poking, I checked this answer and made sure that both of the dags runs on the same schedule, my simplified code is as follows: any help would be appreciated. leader dag: from airflow import DAG from airflow.operators.bash_operator import BashOperator from datetime import datetime, timedelta default

Tasks added to DAG during runtime fail to be scheduled

非 Y 不嫁゛ 提交于 2019-12-01 10:37:51
My idea is to have a task foo which generates a list of inputs (users, reports, log files, etc), and a task is launched for every element in the input list. The goal is to make use of Airflow's retrying and other logic, instead of reimplementing it. So, ideally, my DAG should look something like this: The only variable here is the number of tasks generated. I want to do some more tasks after all of these are completed, so spinning up a new DAG for every task does not seem appropriate. This is my code: default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime(2015, 6

Tasks added to DAG during runtime fail to be scheduled

徘徊边缘 提交于 2019-12-01 08:11:28
问题 My idea is to have a task foo which generates a list of inputs (users, reports, log files, etc), and a task is launched for every element in the input list. The goal is to make use of Airflow's retrying and other logic, instead of reimplementing it. So, ideally, my DAG should look something like this: The only variable here is the number of tasks generated. I want to do some more tasks after all of these are completed, so spinning up a new DAG for every task does not seem appropriate. This is

Airflow: How to push xcom value from PostgreOperator?

六眼飞鱼酱① 提交于 2019-12-01 06:35:47
I'm using Airflow 1.8.1 and I want to push the result of a sql request from PostgreOperator. Here's my tasks: check_task = PostgresOperator( task_id='check_task', postgres_conn_id='conx', sql="check_task.sql", xcom_push=True, dag=dag) def py_is_first_execution(**kwargs): value = kwargs['ti'].xcom_pull(task_ids='check_task') print 'count ----> ', value if value == 0: return 'next_task' else: return 'end-flow' check_branch = BranchPythonOperator( task_id='is-first-execution', python_callable=py_is_first_execution, provide_context=True, dag=dag) and here is my sql script: select count(1) from

Airflow: How to push xcom value from PostgreOperator?

不羁岁月 提交于 2019-12-01 05:32:34
问题 I'm using Airflow 1.8.1 and I want to push the result of a sql request from PostgreOperator. Here's my tasks: check_task = PostgresOperator( task_id='check_task', postgres_conn_id='conx', sql="check_task.sql", xcom_push=True, dag=dag) def py_is_first_execution(**kwargs): value = kwargs['ti'].xcom_pull(task_ids='check_task') print 'count ----> ', value if value == 0: return 'next_task' else: return 'end-flow' check_branch = BranchPythonOperator( task_id='is-first-execution', python_callable=py

Airflow: Why is there a start_date for operators?

故事扮演 提交于 2019-11-30 20:31:58
I don't understand why do we need a 'start_date' for the operators(task instances). Shouldn't the one that we pass to the DAG suffice? Also, if the current time is 7th Feb 2018 8.30 am UTC, and now I set the start_date of the dag to 7th Feb 2018 0.00 am with my cron expression for schedule interval being 30 9 * * * (daily at 9.30 am, i.e expecting to run in next 1 hour). Will my DAG run today at 9.30 am or tomorrow (8th Feb at 9.30 am )? Regarding start_date on task instance, personally I have never used this, I always just have a single DAG start_date. However from what I can see this would

Airflow scheduler is slow to schedule subsequent tasks

余生颓废 提交于 2019-11-30 12:49:13
问题 When I try to run a DAG in Airflow 1.8.0 I find that it takes a lot of time between the time of completion predecessor task and the time at which the successor task is picked up for execution (usually greater the execution times of individual tasks). The same is the scenario for Sequential, Local and Celery Executors. Is there a way to lessen the overhead time mentioned? (like any parameters in airflow.cfg that can speed up the DAG execution?) Gantt chart has been added for reference: 回答1: As