airflow

use Airflow connection from a jinja template

我怕爱的太早我们不能终老 提交于 2021-02-07 20:21:16
问题 I'm trying to pass DB params to BashOperator using environment variables, but I can't find any documentation/examples how to use a connection from a Jinja template. So I'm looking for something similar to variables echo {{ var.value.<variable_name> }} 回答1: Airflow does not provide such macros. However you can create custom macros to address this. Connection example: Creating the macros: def get_host(conn_id): connection = BaseHook.get_connection(conn_id) return connection.host def get_schema

Airflow - Locking between tasks so that only one parallel task runs at a time?

≡放荡痞女 提交于 2021-02-07 19:30:43
问题 I have one DAG that has three task streams (licappts, agents, agentpolicy): For simplicity I'm calling these three distinct streams. The streams are independent in the sense that just because agentpolicy failed doesn't mean the other two (liceappts and agents) should be affected by the other streams failure. But for the sourceType _emr_task_1 tasks (i.e., licappts_emr_task_1, agents_emr_task_1, and agentpolicy_emr_task_1) I can only run one of these tasks at a time. For example I can't run

Error calling BashOperator: Bash command failed

一曲冷凌霜 提交于 2021-02-07 14:23:38
问题 Here are my dag file and BashOperator task: my_dag = { dag_id = 'my_dag', start_date = datetime(year=2017, month=3, day=28), schedule_interval='01***', } my_bash_task = BashOperator( task_id="my_bash_task", bash_command=bash_command, dag=my_dag) bash_command = "/home/jak/my_projects/workflow_env/repo_workflow/db_backup_bash.sh "" Following this answer I even gave a space after the bash file to avoid TemplateNotFound error. But while running this task gave me this error: airflow.exceptions

How to add template variable in the filename of an EmailOperator task? (Airflow)

◇◆丶佛笑我妖孽 提交于 2021-02-07 06:28:05
问题 I can't seem to get this to work. I am trying to send daily a given file, whose name is like 'file_{{ds_nodash}}.csv'. The problem is that I can't seem to add this name as the filename, since it seems it cant be used. In the text of the email or the subject works perfectly, not not on the name. Here is the dag as an example: local_file = 'file-{{ds_nodash}}.csv' send_stats_csv = EmailOperator( task_id='send-stats-csv', to=['email@gmail.com'], subject='Subject - {{ ds }}', html_content='Here

Conditional Tasks using PythonOperator and BranchPythonOperator

与世无争的帅哥 提交于 2021-02-05 12:32:13
问题 Hi Guys am new to airflow and python. I need to run the tasks based on the value of a variable in the input json. If the value of the variable ' insurance ' is " true " then task1, task2, task3 need to run else task4, task5, task6 need to run. Since am a newbie to this i dont have much idea about the usage of PythonOperator & BranchPythonOperator. This is my input json: { "car": { "engine_no": "123_st_456", "json": "{\"make\":\"Honda\",\"model\": Jazz, \"insurance\":\"true\",\"pollution\":\

Conditional Tasks using PythonOperator and BranchPythonOperator

末鹿安然 提交于 2021-02-05 12:30:01
问题 Hi Guys am new to airflow and python. I need to run the tasks based on the value of a variable in the input json. If the value of the variable ' insurance ' is " true " then task1, task2, task3 need to run else task4, task5, task6 need to run. Since am a newbie to this i dont have much idea about the usage of PythonOperator & BranchPythonOperator. This is my input json: { "car": { "engine_no": "123_st_456", "json": "{\"make\":\"Honda\",\"model\": Jazz, \"insurance\":\"true\",\"pollution\":\

Problem with start date and scheduled date in Apache airflow

情到浓时终转凉″ 提交于 2021-02-05 09:15:46
问题 I am working with Apache airflow and I have a problem with the scheduled day and the starting day. I want a dag to run every day at 8:00 AM UTC. So, what I did is: default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime(2020, 12, 7, 10, 0,0), 'email': ['example@emaiil.com'], 'email_on_failure': True, 'email_on_retry': False, 'retries': 1, 'retry_delay': timedelta(hours=5) } #never run dag = DAG(dag_id='id', default_args=default_args, schedule_interval='0 8 * * *'

Apache Airflow - Connection issue to MS SQL Server using pymssql + SQLAlchemy

百般思念 提交于 2021-02-05 05:57:20
问题 I am facing a problem to connect to an Azure MS SQL Server 2014 database in Apache Airflow 1.10.1 using pymssql . I want to use the MsSqlHook class provided by Airflow, for the convenience to create my connection in the Airflow UI, and then create a context manager for my connection using SqlAlchemy : @contextmanager def mssql_session(dt_conn_id): sqla_engine = MsSqlHook(mssql_conn_id=dt_conn_id).get_sqlalchemy_engine() session = sessionmaker(bind=sqla_engine)() try: yield session except:

Apache Airflow - Connection issue to MS SQL Server using pymssql + SQLAlchemy

。_饼干妹妹 提交于 2021-02-05 05:56:05
问题 I am facing a problem to connect to an Azure MS SQL Server 2014 database in Apache Airflow 1.10.1 using pymssql . I want to use the MsSqlHook class provided by Airflow, for the convenience to create my connection in the Airflow UI, and then create a context manager for my connection using SqlAlchemy : @contextmanager def mssql_session(dt_conn_id): sqla_engine = MsSqlHook(mssql_conn_id=dt_conn_id).get_sqlalchemy_engine() session = sessionmaker(bind=sqla_engine)() try: yield session except:

To run Spark Submit programs from a different cluster (1**.1*.0.21) in airflow (1**.1*.0.35). How to connect remotely other cluster in airflow

走远了吗. 提交于 2021-01-29 22:41:16
问题 I have been trying to SparkSubmit programs in Airflow, but spark files are in a different cluster (1**.1*.0.21) and airflow is in (1**.1*.0.35). I am looking for a detailed explanation of this topic with examples. I cant copy or download any xml files or other files to my airflow cluster. When I try in SSH hook it says. Though I have many doubts using SSH Operator and BashOperator. Broken DAG: [/opt/airflow/dags/s.py] No module named paramiko 回答1: You can try using Livy In the following