airflow-scheduler

DAG is not visible on Airflow UI

自闭症网瘾萝莉.ら 提交于 2020-06-17 09:34:28
问题 This is my dag file in dags folder. Code that goes along with the Airflow located at: http://airflow.readthedocs.org/en/latest/tutorial.html """ from airflow import DAG from airflow.operators.dummy_operator import DummyOperator from airflow.operators.python_operator import PythonOperator from datetime import datetime, timedelta from work_file import Test class Main(Test): def __init__(self): super(Test, self).__init__() def create_dag(self): default_args = { "owner": "airflow", "depends_on

Airflow Scheduler is continuously issuing warning when using postgresSQL 12 as backend database

别说谁变了你拦得住时间么 提交于 2020-06-17 09:11:26
问题 While executing airflow scheduler is continue printing following messages and tasks are NOT getting picked up. [2020-02-21 09:21:20,696] {dag_processing.py:663} WARNING - DagFileProcessorManager (PID=11895) exited with exit code -11 - re-launching [2020-02-21 09:21:20,699] {dag_processing.py:556} INFO - Launched DagFileProcessorManager with pid: 11898 [2020-02-21 09:21:20,711] {settings.py:54} INFO - Configured default timezone <Timezone [UTC]> [2020-02-21 09:21:20,725] {settings.py:253} INFO

How to stop DAG from backfilling? catchup_by_default=False and catchup=False does not seem to work and Airflow Scheduler from backfilling

生来就可爱ヽ(ⅴ<●) 提交于 2020-05-23 17:49:13
问题 The setting catchup_by_default=False in airflow.cfg does not seem to work. Also adding catchup=False to the DAG doesn't work neither. Here's how to reproduce the issue. I always start from a clean slate by running airflow resetdb . As soon as I unpause the dag, the tasks start to backfill. Here's the setup for the dag. I'm just using the tutorial example. default_args = { "owner": "airflow", "depends_on_past": False, "start_date": datetime(2018, 9, 16), "email": ["airflow@airflow.com"],

How to Trigger a DAG on the success of a another DAG in Airflow using Python?

跟風遠走 提交于 2020-05-14 17:47:57
问题 I have a python DAG Parent Job and DAG Child Job . The tasks in the Child Job should be triggered on the successful completion of the Parent Job tasks which are run daily. How can add external job trigger ? MY CODE from datetime import datetime, timedelta from airflow import DAG from airflow.operators.postgres_operator import PostgresOperator from utils import FAILURE_EMAILS yesterday = datetime.combine(datetime.today() - timedelta(1), datetime.min.time()) default_args = { 'owner': 'airflow',

Airflow DAG is running for all the retries

孤街醉人 提交于 2020-04-17 22:06:18
问题 I have a DAG running since few months and from last one week it's behaving abnormal. i am running a bash operator which is executing a shell script and in shell script we have a hive query. no of retries set to 4 as below. default_args = { 'owner': 'airflow', 'depends_on_past': False, 'email': ['airflow@example.com'], 'email_on_failure': False, 'email_on_retry': False, 'retries': 4 , 'retry_delay': timedelta(minutes=5) } i can see in the log that it's triggering the hive query and loosing the

Airflow DAG Running Every Second Rather Than Every Minute

感情迁移 提交于 2020-02-03 12:59:32
问题 I'm trying to schedule my DAG to run every minute but it seems to be running every second instead. Based on everything I've read I should just need to include schedule_interval='*/1 * * * *', #..every 1 minute in my DAG and that's it but it's not working. Here a simple example I setup to test it out: from airflow import DAG from airflow.operators import SimpleHttpOperator, HttpSensor, EmailOperator, S3KeySensor from datetime import datetime, timedelta from airflow.operators.bash_operator

Airflow skip current task

一个人想着一个人 提交于 2020-01-24 12:15:07
问题 Is there a way for Airflow to skip current task from within the (Python)Operator? For example: def execute(): if condition: skip_current_task() task = PythonOperator(task_id='task', python_callable=execute, dag=some_dag) Skipping downstream tasks doesn't suit me (a solution proposed in this answer: How to skip tasks on Airflow?), as well as branching. Is there a way for a task to mark its state as skipped from within the Operator? 回答1: Figured it out! Skipping task is as easy as: def execute(

Airflow skip current task

你离开我真会死。 提交于 2020-01-24 12:15:06
问题 Is there a way for Airflow to skip current task from within the (Python)Operator? For example: def execute(): if condition: skip_current_task() task = PythonOperator(task_id='task', python_callable=execute, dag=some_dag) Skipping downstream tasks doesn't suit me (a solution proposed in this answer: How to skip tasks on Airflow?), as well as branching. Is there a way for a task to mark its state as skipped from within the Operator? 回答1: Figured it out! Skipping task is as easy as: def execute(

How to skip tasks on Airflow?

浪子不回头ぞ 提交于 2020-01-23 07:51:19
问题 I'm trying to understand whether Airflow supports skipping tasks in a DAG for ad-hoc executions? Lets say my DAG graph look like this: task1 > task2 > task3 > task4 And I would like to start my DAG manually from task3, what is the best way of doing that? I've read about ShortCircuitOperator , but I'm looking for more ad-hoc solution which can apply once the execution is triggered. Thanks! 回答1: You can incorporate the SkipMixin that the ShortCircuitOperator uses under the hood to skip

Use airflow hive operator and output to a text file

有些话、适合烂在心里 提交于 2020-01-14 22:33:37
问题 Hi I want to execute hive query using airflow hive operator and output the result to a file. I don't want to use INSERT OVERWRITE here. hive_ex = HiveOperator( task_id='hive-ex', hql='/sql/hive-ex.sql', hiveconfs={ 'DAY': '{{ ds }}', 'YESTERDAY': '{{ yesterday_ds }}', 'OUTPUT': '{{ file_path }}'+'csv', }, dag=dag ) What is the best way to do this? I know how to do this using bash operator,but want to know if we can use hive operator hive_ex = BashOperator( task_id='hive-ex', bash_command=