airflow-scheduler

Airflow 1.10 - Scheduler Startup Fails

旧街凉风 提交于 2019-12-20 02:19:07
问题 I've just painfully installed Airflow 1.10 thanks to my previous post here. We have a single ec2-instance running, our queue is AWS Elastic Cache Redis, and our meta database is AWS RDS for PostgreSQL. Airflow works with this setup just fine when we are on Airflow version 1.9. But we are encountering an issue on Airflow version 1.10 when we go to start up the scheduler. [2018-08-15 16:29:14,015] {jobs.py:385} INFO - Started process (PID=15778) to work on /home/ec2-user/airflow/dags/myDag.py

Airflow depends_on_past for whole DAG

纵然是瞬间 提交于 2019-12-19 07:06:31
问题 Is there a way in airflow of using the depends_on_past for an entire DagRun, not just applied to a Task? I have a daily DAG, and the Friday DagRun errored on the 4th task however the Saturday and Sunday DagRuns still ran as scheduled. Using depends_on_past = True would have paused the DagRun on the same 4th task, however the first 3 tasks would still have run. I can see in the DagRun DB table there is a state column that contains failed for the Friday DagRun. What I want is a way configuring

Airflow: Why is there a start_date for operators?

感情迁移 提交于 2019-12-19 02:11:23
问题 I don't understand why do we need a 'start_date' for the operators(task instances). Shouldn't the one that we pass to the DAG suffice? Also, if the current time is 7th Feb 2018 8.30 am UTC, and now I set the start_date of the dag to 7th Feb 2018 0.00 am with my cron expression for schedule interval being 30 9 * * * (daily at 9.30 am, i.e expecting to run in next 1 hour). Will my DAG run today at 9.30 am or tomorrow (8th Feb at 9.30 am )? 回答1: Regarding start_date on task instance, personally

Airflow: Why is there a start_date for operators?

≡放荡痞女 提交于 2019-12-19 02:10:12
问题 I don't understand why do we need a 'start_date' for the operators(task instances). Shouldn't the one that we pass to the DAG suffice? Also, if the current time is 7th Feb 2018 8.30 am UTC, and now I set the start_date of the dag to 7th Feb 2018 0.00 am with my cron expression for schedule interval being 30 9 * * * (daily at 9.30 am, i.e expecting to run in next 1 hour). Will my DAG run today at 9.30 am or tomorrow (8th Feb at 9.30 am )? 回答1: Regarding start_date on task instance, personally

Airflow: Log file isn't local, Unsupported remote log location

删除回忆录丶 提交于 2019-12-18 14:51:55
问题 I am not able see the logs attached to the tasks from the Airflow UI: Log related settings in airflow.cfg file are: remote_base_log_folder = base_log_folder = /home/my_projects/ksaprice_project/airflow/logs worker_log_server_port = 8793 child_process_log_directory = /home/my_projects/ksaprice_project/airflow/logs/scheduler Although I am setting remote_base_log_folter it is trying to fetch the log from http://:8793/log/tutorial/print_date/2017-08-02T00:00:00 - I don't understand this behavior.

Airflow dynamic tasks at runtime

梦想的初衷 提交于 2019-12-18 14:48:15
问题 Other questions about 'dynamic tasks' seem to address dynamic construction of a DAG at schedule or design time. I'm interested in dynamically adding tasks to a DAG during execution. from airflow import DAG from airflow.operators.dummy_operator import DummyOperator from airflow.operators.python_operator import PythonOperator from datetime import datetime dag = DAG('test_dag', description='a test', schedule_interval='0 0 * * *', start_date=datetime(2018, 1, 1), catchup=False) def make_tasks():

Problem updating the connections in Airflow programatically

安稳与你 提交于 2019-12-18 09:55:06
问题 I am trying to update the Airflow connections using python. I have created a python function that takes an authentication token from API and updates the extra field of connection in Airflow. I am getting token in json format like below: { "token" : token_value } Below is the part of python code that I am using def set_token(): # Get token from API & update the Airflow Variables Variable.set("token", str(auth_token)) new_token = Variables.get("token") get_conn = Connection(conn_id="test_conn")

DAGs not clickable on Google Cloud Composer webserver, but working fine on a local Airflow

左心房为你撑大大i 提交于 2019-12-17 19:51:31
问题 I'm using Google Cloud Composer (managed Airflow on Google Cloud Platform) with image version composer-0.5.3-airflow-1.9.0 and Python 2.7, and I'm facing a weird issue : after importing my DAGs, they are not clickable from the Web UI (and there are no buttons "Trigger DAG", "Graph view", ...), while all works perfectly when running a local Airflow. Even if non usable from the webserver on Composer, my DAGs still exist. I can list them using CLI ( list_dags ), describe them ( list_tasks ) and

Rabbitmq /usr/local/etc/rabbitmq/rabbitmq-env.conf Missing

痴心易碎 提交于 2019-12-13 03:38:39
问题 I just installed RabbitMQ on an AWS EC2-Instance (CentOS) using the following, sudo yum install erlang sudo yum install rabbitmq-server I was then able to successfully turn it on using, sudo chkconfig rabbitmq-server on sudo /sbin/service rabbitmq-server start ...and sudo /sbin/service rabbitmq-server stop sudo sudo rabbitmq-server run in foreground; But now I'm trying to modify the /usr/local/etc/rabbitmq/rabbitmq-env.conf file so I can change the NODE_IP_ADDRESS but the file is no where to

Airflow EC2-Instance socket.getfqdn() Bug

爷,独闯天下 提交于 2019-12-13 00:16:52
问题 I'm using Airflow version 1.9 and there is a bug in their software that you can read about here on my previous Stackoverflow post, as well as here on another one of my Stackoverflow posts, and here on Airflow's Github where the bug is reported and discussed. Long story short there are a few locations in Airflow's code where it needs to get the IP address of the server. They accomplish this by running this command: socket.getfqdn() The problem is that on Amazon EC2-Instances (Amazon Linux 1)