airflow-scheduler

Airflow - Proper way to handle DAGs callbacks

戏子无情 提交于 2019-12-24 11:12:16
问题 I have a DAG and then whenever it success or fails, I want it to trigger a method which posts to Slack. My DAG args is like below: default_args = { [...] 'on_failure_callback': slack.slack_message(sad_message), 'on_success_callback': slack.slack_message(happy_message), [...] } And the DAG definition itself: dag = DAG( dag_id = dag_name_id, default_args=default_args, description='load data from mysql to S3', schedule_interval='*/10 * * * *', catchup=False ) But when I check Slack there is more

Programmatically clear the state of airflow task instances

懵懂的女人 提交于 2019-12-24 00:59:32
问题 I want to clear the tasks in DAG B when DAG A completes execution . Both A and B are scheduled DAGs. Is there any operator /way to clear the state of tasks and re-run DAG B programmatically? I'm aware of the CLI option and Web UI option to clear the tasks. 回答1: cli.py is an incredibly useful place to peep into SQLAlchemy magic of Airflow . The clear command is implemented here @cli_utils.action_logging def clear(args): logging.basicConfig( level=settings.LOGGING_LEVEL, format=settings.SIMPLE

Airflow: Re-run DAG from beginning with new schedule

十年热恋 提交于 2019-12-23 09:56:27
问题 Backstory: I was running an Airflow job on a daily schedule, with a start_date of July 1, 2019. The job gathered requested each day's data from a third party, then loaded that data into our database. After running the job successfully for several days, I realized that the third party data source only refreshed their data once a month. As such, I was simply downloading the same data every day. At that point, I changed the start_date to a year ago (to get previous months' info), and changed the

Airflow: Re-run DAG from beginning with new schedule

好久不见. 提交于 2019-12-23 09:55:49
问题 Backstory: I was running an Airflow job on a daily schedule, with a start_date of July 1, 2019. The job gathered requested each day's data from a third party, then loaded that data into our database. After running the job successfully for several days, I realized that the third party data source only refreshed their data once a month. As such, I was simply downloading the same data every day. At that point, I changed the start_date to a year ago (to get previous months' info), and changed the

How do I clear the state of a dag run with the CLI in airflow/composer?

别来无恙 提交于 2019-12-23 00:52:12
问题 I thought I could use the command: g beta composer environments run <env> --location=us-central1 clear -- <dag_id> -s 2018-05-13 -e 2018-05-14 the clear the state of the dag runs on 2018-05-13. For some reason it doesn't work. What happens is that the CLI hangs on a message like: kubeconfig entry generated for <kube node name>. What is the expected behavior of the command above? I would expect it to clear the dag run for the interval, but I might be doing something wrong. 回答1: Running clear

How do I add a new dag to a running airflow service?

断了今生、忘了曾经 提交于 2019-12-22 10:01:40
问题 I have an airflow service that is currently running as separate docker containers for the webserver and scheduler, both backed by a postgres database. I have the dags synced between the two instances and the dags load appropriately when the services start. However, if I add a new dag to the dag folder (on both containers) while the service is running, the dag gets loaded into the dagbag but show up in the web gui with missing metadata. I can run "airflow initdb" after each update but that

Airflow latency between tasks

落爺英雄遲暮 提交于 2019-12-22 08:34:59
问题 As you can see in the image : airflow is making too much time between tasks execution ? it almost represents 30% of the DAG execution time. I've changed the airflow.cfg file to: job_heartbeat_sec = 1 scheduler_heartbeat_sec = 1 but I still have the same latency rate. Why does it behave this way ? 回答1: It is by design. For instance I use Airflow to perform large workflows where some tasks can take a really long time. Airflow is not meant for tasks that will take seconds to execute, it can be

How to run one airflow task and all its dependencies?

我只是一个虾纸丫 提交于 2019-12-22 03:50:51
问题 I suspected that airflow run dag_id task_id execution_date would run all upstream tasks, but it does not. It will simply fail when it sees that not all dependent tasks are run. How can I run a specific task and all its dependencies? I am guessing this is not possible because of an airflow design decision, but is there a way to get around this? 回答1: You can run a task independently by using -i/-I/-A flags along with the run command. But yes the design of airflow does not permit running a

Incorrect work of scheduler interval and start time in Apache Airflow

微笑、不失礼 提交于 2019-12-21 06:26:13
问题 Can't find the solution with start time of tasks. I have code and can't find where I'm wrong. When I`ve run DAG, 25.03, 26.03, 27.03. tasks were completed, but today(28.03) tasks not started in 6:48. I have tried to use cron expressions, pendulum, datetime and result is the same. Local time(UTC+3) and airflow's time(UTC) is different. I've tried to use each time(local, airflow) in 'start date' or 'schedule interval' - no result. Using: Ubuntu, Airflow v. 1.9.0 and local executor. emailname =

Incorrect work of scheduler interval and start time in Apache Airflow

落爺英雄遲暮 提交于 2019-12-21 06:26:07
问题 Can't find the solution with start time of tasks. I have code and can't find where I'm wrong. When I`ve run DAG, 25.03, 26.03, 27.03. tasks were completed, but today(28.03) tasks not started in 6:48. I have tried to use cron expressions, pendulum, datetime and result is the same. Local time(UTC+3) and airflow's time(UTC) is different. I've tried to use each time(local, airflow) in 'start date' or 'schedule interval' - no result. Using: Ubuntu, Airflow v. 1.9.0 and local executor. emailname =