airflow-scheduler

Can airflow be used to run a never ending task?

♀尐吖头ヾ 提交于 2019-12-12 23:40:46
问题 Can we use an airflow dag to define a never-ending job (ie. a task which has a unconditional loop to consume stream data) by setting the task/dag timeout to None and manually trigger its running? Would having airflow monitor a never ending task cause a problem? Thanks 回答1: A bit odd to run this through Airflow, but yeah I don't think that's an issue. Just note that if you restart the worker running the job (assuming CeleryExecutor), you'll interrupt the task and need to kick it off manually

Airflow S3KeySensor - How to make it continue running

家住魔仙堡 提交于 2019-12-12 08:24:02
问题 With the help of this Stackoverflow post I just made a program (the one shown in the post) where when a file is placed inside an S3 bucket a task in one of my running DAGs is triggered and then I perform some work using the BashOperator. Once it's done though the DAG is no longer in a running state but instead goes into a success state and if I want to have it pick up another file I need to clear all the 'Past', 'Future', 'Upstream', 'Downstream' activity. I would like to make this program so

Run DAG with tasks of different interval

大城市里の小女人 提交于 2019-12-11 18:19:04
问题 I have 3 tasks, A, B and C. I want to run task A only once, and then run task B monthly until end_date, then run task C only once to clean up. This is similar to this question, but not applicable. How to handle different task intervals on a single Dag in airflow? Thanks for your help 回答1: For task A that is supposed to run only once, you can take inspiration from here As far as tasks B & C are concerned, they can be tied up with A using a ShortCircuitOperator (as already told in the link you

In Apache Airflow Tool, DAG wont run due to Duplicate Entry Problem in task_instance table

喜你入骨 提交于 2019-12-11 15:52:16
问题 Today all day i have been getting this error in the scheduler of Airflow. sqlalchemy.exc.IntegrityError: (_mysql_exceptions.IntegrityError) (1062, "Duplicate entry '%' fir key 'PRIMARY')") Because of this the Airflow Scheduler would stop and every time i ran this had the same problem 回答1: This is due to MySQL's ON UPDATE CURRENT_TIMESTAMP and this is posted in JIRA of Airflow : https://issues.apache.org/jira/projects/AIRFLOW/issues/AIRFLOW-3045?filter=allopenissues I fixed this by altering

SQL Server Hook and operator connection in Airflow

落爺英雄遲暮 提交于 2019-12-11 15:26:40
问题 I am new to using airflow and what I need to do is to use MssqlHook or MssqlOperator but I do not know how. By using hook and operator below code hook = MsSqlHook(mssql_conn_id=ms_sql) t2 = MsSqlOperator( task_id = 'sql-op', mssql_conn_id = ms_sql, sql = 'Select Current_date()', dag = dag) in Airflow UI connections- Conn Id - ms_sql Conn Type -Microsoft SQL server Host - host_name schema- default login- username password- password port - 14481 And when I do this the error is Connection failed

Airflow list dag times out exactly after 30 seconds

℡╲_俬逩灬. 提交于 2019-12-11 15:15:50
问题 I have a dynamic airflow dag( backfill_dag ) that basically reads admin variable(Json) and builds it self. Backfill_dag is used for backfilling/history loading, so for example if I wants to history load dag x,y, n z in some order(x n y run in parallel, z depends on x) then I will mention this in a particular json format and put it in admin variable of backfill_dag . Backfill_dag now: parses the Json, renders the tasks of the dags x,y, and z, and builds itself dynamically with x and y in

Airflow Scheduler not picking up DAG Runs

孤人 提交于 2019-12-11 07:16:28
问题 I'm setting up airflow such that webserver runs on one machine and scheduler runs on another. Both share the same MySQL metastore database. Both instances come up without any errors in the logs but the scheduler is not picking up any DAG Runs that are created by manually triggering the DAGs via the Web UI. The dag_run table in MysQL shows few entries, all in running state: mysql> select * from dag_run; +----+--------------------------------+----------------------------+---------+-------------

Running Airflow Tasks In Parallel - Nothing Gets Scheduled

放肆的年华 提交于 2019-12-11 07:08:18
问题 I just went through the process of configuring my Airflow setup to be capable of parallel processing by following this article and using this article. Everything seems to be working fine in the sense that I was able to run all of those commands from the articles without any errors, warnings, or exceptions. I was able to start up the airflow webserver and airflow scheduler and I'm able to go on the UI and view all my DAGs but now none of my DAGs are starting that previously were working. I had

schedule_interval and other gotchas with SubDagOperator

为君一笑 提交于 2019-12-11 06:04:21
问题 Airflow documentation clearly states SubDAGs must have a schedule and be enabled. If the SubDAG’s schedule is set to None or @once, the SubDAG will succeed without having done anything Although we must stick to the documenation, I've found they work without a hiccup even with schedule_interval set to None or @once . Here's my working example. My current understanding (I heard about Airflow only 2 weeks back) of SubDagOperator s (or subdag s) is Airflow treats a subdag as just another task