airflow

How to test airflow dag in unittest?

穿精又带淫゛_ 提交于 2021-01-21 18:55:10
问题 I am trying to test a dag with more than one task in the test environment. I was able to test single task associated with the dag but I want to create several tasks in dag and kick of the first task. For testing one task in a dag I am using task1.run() which is getting executed. But, the same is not working when I have many tasks one after another in downstream of a dag. from airflow import DAG from airflow.operators.bash_operator import BashOperator from datetime import datetime, timedelta

How to test airflow dag in unittest?

流过昼夜 提交于 2021-01-21 18:53:47
问题 I am trying to test a dag with more than one task in the test environment. I was able to test single task associated with the dag but I want to create several tasks in dag and kick of the first task. For testing one task in a dag I am using task1.run() which is getting executed. But, the same is not working when I have many tasks one after another in downstream of a dag. from airflow import DAG from airflow.operators.bash_operator import BashOperator from datetime import datetime, timedelta

How to run DBT in airflow without copying our repo

耗尽温柔 提交于 2021-01-21 11:20:01
问题 We use DBT with GCP and BigQuery for transformations in BigQuery, and the simplest approach to scheduling our daily run dbt seems to be a BashOperator in Airflow. Currently we have two separate directories / github projects, one for DBT and another for Airflow. To schedule DBT to run with Airflow, it seems like our entire DBT project would need to be nested inside of our Airflow project, that way we can point to it for our dbt run bash command? Is it possible to trigger our dbt run and dbt

How to run DBT in airflow without copying our repo

旧巷老猫 提交于 2021-01-21 11:19:18
问题 We use DBT with GCP and BigQuery for transformations in BigQuery, and the simplest approach to scheduling our daily run dbt seems to be a BashOperator in Airflow. Currently we have two separate directories / github projects, one for DBT and another for Airflow. To schedule DBT to run with Airflow, it seems like our entire DBT project would need to be nested inside of our Airflow project, that way we can point to it for our dbt run bash command? Is it possible to trigger our dbt run and dbt

Airflow + Nginx set up gives Airflow 404 = lots of circles

天涯浪子 提交于 2021-01-21 05:37:05
问题 I'm trying to set up Airflow behind nginx, using the instructions given here. airflow.cfg file base_url = https://myorg.com/airflow web_server_port = 8081 . . . enable_proxy_fix = True nginx configuration server { listen 443 ssl http2 default_server; server_name myorg.com; . . . location /airflow { proxy_pass http://localhost:8081; proxy_set_header Host $host; proxy_redirect off; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; proxy_set

How to get last two successful execution dates of Airflow job?

╄→尐↘猪︶ㄣ 提交于 2021-01-16 04:01:45
问题 I need to get last two successful execution dates of Airflow job to use in my current run. Example : Execution date Job status 2020-05-03 success 2020-05-04 fail 2020-05-05 success Question : When I run my job on May 6th I should get values of May 3rd and 5th into variables. Is it possible? 回答1: You can leverage SQLAlchemy magic for retrieving execution_date s against last 'n' successfull runs from pendulum import Pendulum from typing import List, Dict, Any, Optional from airflow.utils.state

How to get last two successful execution dates of Airflow job?

筅森魡賤 提交于 2021-01-16 03:53:42
问题 I need to get last two successful execution dates of Airflow job to use in my current run. Example : Execution date Job status 2020-05-03 success 2020-05-04 fail 2020-05-05 success Question : When I run my job on May 6th I should get values of May 3rd and 5th into variables. Is it possible? 回答1: You can leverage SQLAlchemy magic for retrieving execution_date s against last 'n' successfull runs from pendulum import Pendulum from typing import List, Dict, Any, Optional from airflow.utils.state

How to get last two successful execution dates of Airflow job?

半腔热情 提交于 2021-01-16 03:53:17
问题 I need to get last two successful execution dates of Airflow job to use in my current run. Example : Execution date Job status 2020-05-03 success 2020-05-04 fail 2020-05-05 success Question : When I run my job on May 6th I should get values of May 3rd and 5th into variables. Is it possible? 回答1: You can leverage SQLAlchemy magic for retrieving execution_date s against last 'n' successfull runs from pendulum import Pendulum from typing import List, Dict, Any, Optional from airflow.utils.state

How to get last two successful execution dates of Airflow job?

大城市里の小女人 提交于 2021-01-16 03:53:05
问题 I need to get last two successful execution dates of Airflow job to use in my current run. Example : Execution date Job status 2020-05-03 success 2020-05-04 fail 2020-05-05 success Question : When I run my job on May 6th I should get values of May 3rd and 5th into variables. Is it possible? 回答1: You can leverage SQLAlchemy magic for retrieving execution_date s against last 'n' successfull runs from pendulum import Pendulum from typing import List, Dict, Any, Optional from airflow.utils.state

Airflow on Kubernetes: Errno 13 - Permission denied: '/opt/airflow/logs/scheduler

泄露秘密 提交于 2021-01-05 11:26:46
问题 I am running Airflow on Kubernetes from the stable helm chart. I'm running this in an AWS environment. This error exists with and without mounting any external volumes for log storage. I tried to set the configuration of the [logs] section to point to an EFS volume that I created. The PV gets mounted through a PVC but my containers are crashing (scheduler and web) due to the following error: *** executing Airflow initdb... Unable to load the config, contains a configuration error. Traceback