Airbnb Airflow using all system resources

后端未结

关注

 7  1879

We\'ve set up Airbnb/Apache Airflow for our ETL using LocalExecutor, and as we\'ve started building more complex DAGs, we\'ve noticed that Airflow has starting usin

相关标签:

7条回答

既然无缘

2021-02-03 19:55

I tried to run Airflow on a AWS t2.micro instance (1vcpu, 1gb of memory, eligible for free tier), and had the same issue : the worker consumed 100% of the cpu and consumed all available memory.

The EC2 instance was totally stuck and unusable, of course Airflow didn't working.

So I created a 4GB swap file using the method described here. With the swap, no more issues, Airflow was fully functionnal. Of course, with only one vcpu, you cannot expect incredible performances, but it runs.

0 讨论(0)
发布评论:

提交评论
- 加载中...
再見小時候

2021-02-03 19:56
the key point is HOW to processing dag files. reduce cpu usage from 80%+ to 30% for scheduler on a 8-core server, i have updated 2 config key,
```
min_file_process_interval from 0 to 60.
max_threads from 1000 to 50. 
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

误落风尘

2021-02-03 20:01

Try to change the below config in airflow.cfg

# after how much time a new DAGs should be picked up from the filesystem
min_file_process_interval = 0

# How many seconds to wait between file-parsing loops to prevent the logs from being spammed.
min_file_parsing_loop_time = 1

0 讨论(0)

执念已碎

2021-02-03 20:01

I have faced the same issue deploying airflow on EKS.Its resolved by updating max_threads to 128 in airflow config.

max_threads: Scheduler will spawn multiple threads in parallel to schedule dags. This is controlled by max_threads with default value of 2. User should increase this value to a larger value (e.g numbers of cpus where scheduler runs - 1) in production.

From here https://airflow.apache.org/docs/stable/faq.html

0 讨论(0)
发布评论:

提交评论
- 加载中...
天命终不由人

2021-02-03 20:07
I have also tried everything I could to get the CPU usage down and Matthew Housley's advice regarding MIN_FILE_PROCESS_INTERVAL was what did the trick.

At least until airflow 1.10 came around... then the CPU usage went through the roof again.

So here is everything I had to do to get airflow to work well on a standard digital ocean droplet with 2gb of ram and 1 vcpu:

1. Scheduler File Processing

Prevent airflow from reloading the dags all the time and set: AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL=60

2. Fix airflow 1.10 scheduler bug

The AIRFLOW-2895 bug in airflow 1.10, causes high CPU load, because the scheduler keeps looping without a break.

It's already fixed in master and will hopefully be included in airflow 1.10.1, but it could take weeks or months until its released. In the meantime this patch solves the issue:
```
--- jobs.py.orig    2018-09-08 15:55:03.448834310 +0000
+++ jobs.py     2018-09-08 15:57:02.847751035 +0000
@@ -564,6 +564,7 @@

         self.num_runs = num_runs
         self.run_duration = run_duration
+        self._processor_poll_interval = 1.0

         self.do_pickle = do_pickle
         super(SchedulerJob, self).__init__(*args, **kwargs)
@@ -1724,6 +1725,8 @@
             loop_end_time = time.time()
             self.log.debug("Ran scheduling loop in %.2f seconds",
                            loop_end_time - loop_start_time)
+            self.log.debug("Sleeping for %.2f seconds", self._processor_poll_interval)
+            time.sleep(self._processor_poll_interval)

             # Exit early for a test mode
             if processor_manager.max_runs_reached():
```
Apply it with patch -d /usr/local/lib/python3.6/site-packages/airflow/ < af_1.10_high_cpu.patch;

3. RBAC webserver high CPU load

If you upgraded to use the new RBAC webserver UI, you may also notice that the webserver is using a lot of CPU persistently.

For some reason the RBAC interface uses a lot of CPU on startup. If you are running on a low powered server, this can cause a very slow webserver startup and permanently high CPU usage.

I have documented this bug as AIRFLOW-3037. To solve it you can adjust the config:
```
AIRFLOW__WEBSERVER__WORKERS=2 # 2 * NUM_CPU_CORES + 1
AIRFLOW__WEBSERVER__WORKER_REFRESH_INTERVAL=1800 # Restart workers every 30min instead of 30seconds
AIRFLOW__WEBSERVER__WEB_SERVER_WORKER_TIMEOUT=300 #Kill workers if they don't start within 5min instead of 2min
```
With all of these tweaks my airflow is using only a few % of CPU during idle time on a digital ocean standard droplet with 1 vcpu and 2gb of ram.
0 讨论(0)
发布评论:

提交评论
- 加载中...
旧巷少年郎

2021-02-03 20:09
For starters, you can use htop to monitor and debug your CPU usage.

I would suggest that you run webserver and scheduler processes on the same docker container which would reduce the resources required to run two containers on a ec2 t2.medium. Airflow workers need resources for downloading data and reading it in memory but webserver and scheduler are pretty lightweight processes. Makes sure when you run webserver you are controlling the number of workers running on the instance using the cli.
```
airflow webserver [-h] [-p PORT] [-w WORKERS]
                         [-k {sync,eventlet,gevent,tornado}]
                         [-t WORKER_TIMEOUT] [-hn HOSTNAME] [--pid [PID]] [-D]
                         [--stdout STDOUT] [--stderr STDERR]
                         [-A ACCESS_LOGFILE] [-E ERROR_LOGFILE] [-l LOG_FILE]
                         [-d]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页

Airbnb Airflow using all system resources

1. Scheduler File Processing

2. Fix airflow 1.10 scheduler bug

3. RBAC webserver high CPU load