apache-airflow

Airflow kills my tasks after 1 minute

只谈情不闲聊 提交于 2019-12-23 09:57:15
问题 I have a very simple DAG with two tasks, like following: default_args = { 'owner': 'me', 'start_date': dt.datetime.today(), 'retries': 0, 'retry_delay': dt.timedelta(minutes=1) } dag = DAG( 'test DAG', default_args=default_args, schedule_interval=None ) t0 = PythonOperator( task_id="task 1", python_callable=run_task_1, op_args=[arg_1, args_2, args_3], dag=dag, execution_timeout=dt.timedelta(minutes=60) ) t1 = PythonOperator( task_id="task 2", python_callable=run_task_2, dag=dag, execution

AssertionError: INTERNAL: No default project is specified

≯℡__Kan透↙ 提交于 2019-12-21 17:55:10
问题 New to airflow. Trying to run the sql and store the result in a BigQuery table. Getting following error. Not sure where to setup the default_rpoject_id. Please help me. Error: Traceback (most recent call last): File "/usr/local/bin/airflow", line 28, in <module> args.func(args) File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 585, in test ti.run(ignore_task_deps=True, ignore_ti_state=True, test_mode=True) File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py",

AssertionError: INTERNAL: No default project is specified

谁说胖子不能爱 提交于 2019-12-21 17:53:10
问题 New to airflow. Trying to run the sql and store the result in a BigQuery table. Getting following error. Not sure where to setup the default_rpoject_id. Please help me. Error: Traceback (most recent call last): File "/usr/local/bin/airflow", line 28, in <module> args.func(args) File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 585, in test ti.run(ignore_task_deps=True, ignore_ti_state=True, test_mode=True) File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py",

Debugging Broken DAGs

混江龙づ霸主 提交于 2019-12-20 20:37:23
问题 When the airflow webserver shows up errors like Broken DAG: [<path/to/dag>] <error> , how and where can we find the full stacktrace for these exceptions? I tried these locations: /var/log/airflow/webserver -- had no logs in the timeframe of execution, other logs were in binary and decoding with strings gave no useful information. /var/log/airflow/scheduler -- had some logs but were in binary form, tried to read them and looked to be mostly sqlalchemy logs probably for airflow's database. /var

Airflow authentication setups fails with “AttributeError: can't set attribute”

风格不统一 提交于 2019-12-19 15:27:01
问题 The Airflow version 1.8 password authentication setup as described in the docs fails at the step user.password = 'set_the_password' with error AttributeError: can't set attribute 回答1: It's better to simply use the new method of PasswordUser _set_password : # Instead of user.password = 'password' user._set_password = 'password' 回答2: This is due to an update of SqlAlchemy to a version >= 1.2 that introduced a backwards incompatible change. You can fix this by explicitly installing a SqlAlchemy

Airflow authentication setups fails with “AttributeError: can't set attribute”

荒凉一梦 提交于 2019-12-19 15:26:09
问题 The Airflow version 1.8 password authentication setup as described in the docs fails at the step user.password = 'set_the_password' with error AttributeError: can't set attribute 回答1: It's better to simply use the new method of PasswordUser _set_password : # Instead of user.password = 'password' user._set_password = 'password' 回答2: This is due to an update of SqlAlchemy to a version >= 1.2 that introduced a backwards incompatible change. You can fix this by explicitly installing a SqlAlchemy

Airflow depends_on_past for whole DAG

纵然是瞬间 提交于 2019-12-19 07:06:31
问题 Is there a way in airflow of using the depends_on_past for an entire DagRun, not just applied to a Task? I have a daily DAG, and the Friday DagRun errored on the 4th task however the Saturday and Sunday DagRuns still ran as scheduled. Using depends_on_past = True would have paused the DagRun on the same 4th task, however the first 3 tasks would still have run. I can see in the DagRun DB table there is a state column that contains failed for the Friday DagRun. What I want is a way configuring

Airflow: Log file isn't local, Unsupported remote log location

删除回忆录丶 提交于 2019-12-18 14:51:55
问题 I am not able see the logs attached to the tasks from the Airflow UI: Log related settings in airflow.cfg file are: remote_base_log_folder = base_log_folder = /home/my_projects/ksaprice_project/airflow/logs worker_log_server_port = 8793 child_process_log_directory = /home/my_projects/ksaprice_project/airflow/logs/scheduler Although I am setting remote_base_log_folter it is trying to fetch the log from http://:8793/log/tutorial/print_date/2017-08-02T00:00:00 - I don't understand this behavior.

How to restart a failed task on Airflow

别说谁变了你拦得住时间么 提交于 2019-12-18 10:50:11
问题 I am using a LocalExecutor and my dag has 3 tasks where task(C) is dependant on task(A). Task(B) and task(A) can run in parallel something like below A-->C B So task(A) has failed and but task(B) ran fine . Task(C) is yet to run as task(A) has failed. My question is how do i re run Task(A) alone so Task(C) runs once Task(A) completes and Airflow UI marks them as success. 回答1: In the UI: Go to the dag, and dag run of the run you want to change Click on GraphView Click on task A Click "Clear"

Make custom Airflow macros expand other macros

╄→гoц情女王★ 提交于 2019-12-17 18:43:12
问题 Is there any way to make a user-defined macro in Airflow which is itself computed from other macros? from airflow import DAG from airflow.operators.bash_operator import BashOperator dag = DAG( 'simple', schedule_interval='0 21 * * *', user_defined_macros={ 'next_execution_date': '{{ dag.following_schedule(execution_date) }}', }, ) task = BashOperator( task_id='bash_op', bash_command='echo "{{ next_execution_date }}"', dag=dag, ) The use case here is to back-port the new Airflow v1.8 next