airflow-scheduler

Airflow DAG not getting scheduled

徘徊边缘 提交于 2019-12-11 04:31:49
问题 I am new to Airflow and created my first DAG. Here is my DAG code. I want the DAG to start now and thereafter run once in a day. from airflow import DAG from airflow.operators.bash_operator import BashOperator from datetime import datetime, timedelta default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime.now(), 'email': ['aaaa@gmail.com'], 'email_on_failure': False, 'email_on_retry': False, 'retries': 1, 'retry_delay': timedelta(minutes=5), } dag = DAG( 'alamode

Run Stored Procedure in Airflow

蹲街弑〆低调 提交于 2019-12-08 05:04:32
问题 I try to run my stored procedure in Airflow. Simply, I imported mssql operator and tried to execute following: sql_command = """ EXEC [spAirflowTest] """ t3 = MsSqlOperator( task_id = 'run_test_proc', mssql_conn_id = 'FIConnection', sql = sql_command, dag = dag, database = 'RDW') It completes this task as successful. However, task is not even executed. Because I get no error from system, I also cannot identify the error. To identify whether it arrived to my microsoft sql server, I checked

Airflow 1.10.3 - Blank “Recent Tasks” and “DAG Runs”

放肆的年华 提交于 2019-12-07 11:48:15
问题 I installed Airflow 1.10.3 on Ubuntu 18.10 and am able to add my DAGs and run them but "Recent Tasks" and "DAG Runs" in the Web UI are blank. All I see are a black dotted circle which keeps loading but nothing ever materializes. I recently upgraded my Airflow db to MySQL to see if that would fix it but everything is still the same. Is this a configuration issue in airflow.cfg or something else? 回答1: Apparently the DAG name can break the HTML document variable querySelector for "Recent Tasks"

Airflow + Cluster + Celery + SQS - Airflow Worker: 'Hub' object has no attribute '_current_http_client'

旧时模样 提交于 2019-12-06 07:11:31
I'm trying to cluster my Airflow setup and I'm using this article to do so. I just configured my airflow.cfg file to use the CeleryExecutor , I pointed my sql_alchemy_conn to my postgresql database that's running on the same master node, I've set the broker_url to use AWS SQS (I didn't set the access_key_id or secret_key since it's running on an EC2-Instance it doesn't need those), and I've set the celery_result_backend to my postgresql server too. I saved my new airflow.cfg changes, I ran airflow initdb , and then I ran airflow scheduler which worked. I went to the UI and turned on one of my

How do I add a new dag to a running airflow service?

若如初见. 提交于 2019-12-05 18:27:55
I have an airflow service that is currently running as separate docker containers for the webserver and scheduler, both backed by a postgres database. I have the dags synced between the two instances and the dags load appropriately when the services start. However, if I add a new dag to the dag folder (on both containers) while the service is running, the dag gets loaded into the dagbag but show up in the web gui with missing metadata. I can run "airflow initdb" after each update but that doesn't feel right. Is there a better way for the scheduler and webserver to sync up with the database?

Airflow latency between tasks

时光毁灭记忆、已成空白 提交于 2019-12-05 12:10:14
As you can see in the image : airflow is making too much time between tasks execution ? it almost represents 30% of the DAG execution time. I've changed the airflow.cfg file to: job_heartbeat_sec = 1 scheduler_heartbeat_sec = 1 but I still have the same latency rate. Why does it behave this way ? It is by design. For instance I use Airflow to perform large workflows where some tasks can take a really long time. Airflow is not meant for tasks that will take seconds to execute, it can be used for that of course but might not be the most suitable tool. With that said there is not much that you

How to run one airflow task and all its dependencies?

ぐ巨炮叔叔 提交于 2019-12-05 01:39:14
I suspected that airflow run dag_id task_id execution_date would run all upstream tasks, but it does not. It will simply fail when it sees that not all dependent tasks are run. How can I run a specific task and all its dependencies? I am guessing this is not possible because of an airflow design decision, but is there a way to get around this? You can run a task independently by using -i/-I/-A flags along with the run command. But yes the design of airflow does not permit running a specific task and all its dependencies. You can backfill the dag by removing non-related tasks from the DAG for

Scheduling dag runs in Airflow

﹥>﹥吖頭↗ 提交于 2019-12-04 03:46:54
问题 Got a general query on Airflow Is it possible to have a dag file scheduled based on another dag file's schedule. For example, if I have 2 dags namely dag1 and dag2. I am trying to see if I can have dag2 run each time dag1 is successful else dag2 does not run. Is this possible in Airflow. 回答1: You will want to add a TriggerDagRunOperator the end of dag1 and set the schedule of dag2 to None . In addition, if you want to handle multiple cases for the output of dag1 , you can add in a

Airflow S3KeySensor - How to make it continue running

只谈情不闲聊 提交于 2019-12-03 23:48:20
With the help of this Stackoverflow post I just made a program (the one shown in the post) where when a file is placed inside an S3 bucket a task in one of my running DAGs is triggered and then I perform some work using the BashOperator. Once it's done though the DAG is no longer in a running state but instead goes into a success state and if I want to have it pick up another file I need to clear all the 'Past', 'Future', 'Upstream', 'Downstream' activity. I would like to make this program so that it's always running and anytime a new file is placed inside the S3 bucket the program kicks off

Incorrect work of scheduler interval and start time in Apache Airflow

老子叫甜甜 提交于 2019-12-03 21:10:23
Can't find the solution with start time of tasks. I have code and can't find where I'm wrong. When I`ve run DAG, 25.03, 26.03, 27.03. tasks were completed, but today(28.03) tasks not started in 6:48. I have tried to use cron expressions, pendulum, datetime and result is the same. Local time(UTC+3) and airflow's time(UTC) is different. I've tried to use each time(local, airflow) in 'start date' or 'schedule interval' - no result. Using: Ubuntu, Airflow v. 1.9.0 and local executor. emailname = Variable.get('test_mail') l_start_date = datetime(2018, 3, 25, 6, 48) l_schedule_interval = '@daily'