Airflow latency between tasks

时光毁灭记忆、已成空白 提交于 2019-12-05 12:10:14

It is by design. For instance I use Airflow to perform large workflows where some tasks can take a really long time. Airflow is not meant for tasks that will take seconds to execute, it can be used for that of course but might not be the most suitable tool.

With that said there is not much that you can do since you already found out the key settings to configure.

Additionally you might want to try to increase the number of threads of the scheduler:

   [scheduler]
   max_threads = 4

This can alternatively be done by setting the environment variable:

AIRFLOW__SCHEDULER__MAX_THREADS=4

However do not count on the latency to decrease that much.

Thirty seconds is fairly high for inter-task latency. In well-tuned environments I've seen, ~4-6 seconds between a task and a dependent task has been a fairly reasonable lower bound, even for environments with many thousands of DAGs.

As you've already stated, increasing the scheduler heartbeat (scheduler_heartbeat_sec) and the number of threads the scheduler has (scheduler.max_threads) are the best to decrease scheduling delays. If your tasks are blocked on other conditions (which you can check in logs; core.logging_level = DEBUG for even more information), then you should resolve those first.

If you've adjusted both the scheduler heartbeat and the number of worker threads and you still see high scheduling delays, then you may need to consider using a more powerful machine.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!