airflow-scheduler

Use airflow hive operator and output to a text file

一世执手 提交于 2020-01-14 22:26:02
问题 Hi I want to execute hive query using airflow hive operator and output the result to a file. I don't want to use INSERT OVERWRITE here. hive_ex = HiveOperator( task_id='hive-ex', hql='/sql/hive-ex.sql', hiveconfs={ 'DAY': '{{ ds }}', 'YESTERDAY': '{{ yesterday_ds }}', 'OUTPUT': '{{ file_path }}'+'csv', }, dag=dag ) What is the best way to do this? I know how to do this using bash operator,but want to know if we can use hive operator hive_ex = BashOperator( task_id='hive-ex', bash_command=

How to delete XCOM objects once the DAG finishes its run in Airflow

醉酒当歌 提交于 2020-01-14 08:11:11
问题 I have a huge json file in the XCOM which later I do not need once the dag execution is finished, but I still see the Xcom Object in the UI with all the data, Is there any way to delete the XCOM programmatically once the DAG run is finished. Thank you 回答1: You have to add a task depends on you metadatadb (sqllite, PostgreSql, MySql..) that delete XCOM once the DAG run is finished. delete_xcom_task = PostgresOperator( task_id='delete-xcom-task', postgres_conn_id='airflow_db', sql="delete from

Dynamically Creating DAG based on Row available on DB Connection

六眼飞鱼酱① 提交于 2020-01-06 05:38:31
问题 I want to create a dynamically created DAG from database table query. When I'm trying to create a dynamically creating DAG from both of range of exact number or based on available object in airflow settings it's succeeded. However when I'm trying to use a PostgresHook and create a DAG for each of row of my table, I can see a new DAG generated whenever I add a new row in my table. However it turned out that I can't click the newly created DAG on my airflow web server ui. For more context I'm

Need to access schedule time in DockerOperator in Airflow

两盒软妹~` 提交于 2020-01-06 05:08:39
问题 Need to access schedule time in airflow's docker operator. For example t1 = DockerOperator( task_id="task", dag=dag, image="test-runner:1.0", docker_url="xxx.xxx.xxx.xxx:2376", environment={"FROM": "{{(execution_date + macros.timedelta(hours=6,minutes=(30))).isoformat()}}"}) Basically, I need to populate schedule time as docker environment. 回答1: First macros only works if it is a template_fields. Second, you need to check which version of airflow you are using, if you are using 1.9 or below,

Airflow DAG dynamic structure

寵の児 提交于 2020-01-04 14:24:09
问题 I was looking for a solution where I can decide the dag structure when the dag is triggered as I'm not sure about the number of operators that I'll have to run. Please refer below for the execution sequence that I'm planning to create. |-- Task B.1 --| |-- Task C.1 --| |-- Task B.2 --| |-- Task C.2 --| Task A --|-- Task B.3 --|---> Task B ---> |-- Task C.3 --| | .... | | .... | |-- Task B.N --| |-- Task C.N --| I'm not sure about the value of N. Is this possible in airflow. If so, how do I

how do I use the --conf option in airflow

混江龙づ霸主 提交于 2020-01-01 09:26:06
问题 I am trying to run a airflow DAG and need to pass some parameters for the tasks. How do I read the JSON string passed as the --conf parameter in the command line trigger_dag command, in the python DAG file. ex: airflow trigger_dag 'dag_name' -r 'run_id' --conf '{"key":"value"}' 回答1: Two ways. From inside a template field or file: {{ dag_run.conf['key'] }} Or when context is available, e.g. within a python callable of the PythonOperator : context['dag_run'].conf['key'] 来源: https:/

airflow pass parameter from cli

两盒软妹~` 提交于 2019-12-31 12:43:07
问题 Is there a way to pass a parameter to: airflow trigger_dag dag_name {param} ? I have a script that monitors a directory for files - when a file gets moves into the target directory I want to trigger the dag passing as a parameter the file path. 回答1: you can pass it like this: airflow trigger_dag --conf {"file_variable": "/path/to/file"} dag_id Then in your dag, you can access this variable using templating as follows: {{ dag_run.conf.file_variable }} If this doesn't work, sharing a simple

Dynamic dags not getting added by scheduler

℡╲_俬逩灬. 提交于 2019-12-31 05:38:07
问题 I am trying to create Dynamic DAGs and then get them to the scheduler. I tried the reference from https://www.astronomer.io/guides/dynamically-generating-dags/ which works well. I changed it a bit as in the below code. Need help in debugging the issue. I tried 1. Test run the file. The Dag gets executed and the globals() is printing all the DAGs objects. But somehow not listing in the list_dags or in the UI from datetime import datetime, timedelta import requests import json from airflow

AirflowException: Celery command failed - The recorded hostname does not match this instance's hostname

佐手、 提交于 2019-12-29 08:21:50
问题 I'm running Airflow on a clustered environment running on two AWS EC2-Instances. One for master and one for the worker. The worker node though periodically throws this error when running "$airflow worker": [2018-08-09 16:15:43,553] {jobs.py:2574} WARNING - The recorded hostname ip-1.2.3.4 does not match this instance's hostname ip-1.2.3.4.eco.tanonprod.comanyname.io Traceback (most recent call last): File "/usr/bin/airflow", line 27, in <module> args.func(args) File "/usr/local/lib/python3.6

catchup = False, why still two schedule runs are scheduled?

本小妞迷上赌 提交于 2019-12-24 19:23:06
问题 I've simple DAG: (Airflow v1.10.16, using SequentialExecutor on localhost machine) start_date set in past catchup = False default_args = {'owner': 'test_user', 'start_date': datetime(2019, 12, 1, 1, 00, 00),} graph1 = DAG(dag_id = 'test_dag', default_args=default_args, schedule_interval=timedelta(days=1), catchup = False) t = PythonOperator(task_id='t', python_callable=my_func, dag=graph1) as per code comments :param catchup: Perform scheduler catchup (or only run latest)? I expected when the