The different performance of Python multiprocessing.Pool on MacOS and Linux systems

生来就可爱ヽ(ⅴ<●) 提交于 2021-02-19 07:53:07

问题


I'm a beginner in Python. I used multiprocessing.Pool in my project to imporve performance.

Here's a snippet of code I use the multiprocessing.Pool.

I build the pool at the starting of my resident server, and use the Pool.apply_async method every time when the server get a request :

# build pool when server started
mp.set_start_method('forkserver')
self._driver_pool = Pool(processes=10)
self._executor_pool = Pool(processes=30)  
# use pool every time get a request
driver = driver_class(driver_context, init_table, self._manager, **kwargs_dict)
future = self._driver_pool.apply_async(driver.run)

I tested the code on my computer which's operating system is MacOS, and then I deploy the code on a Linux computer.

I found that when I run my code on MacOS, the Pool.apply_async method costs likely 10ms, but the same code on Linux will cost 2s.

I don't understand why there is such a big difference in performance, Is there something wrong with the way I use the multiprocessing.Pool?


回答1:


After some tests, I have a conjecture.

The current phenomenon is when the size of Pool is set to be 30, the first 30 requests were slow, but after that, the performance of tasks will decrease significantly.

On MacOS, I compared performance in both scenarios with and without pyc files, I found that the cost will raise after I deleted the pyc files.

I suspect there are several possible reasons for the performance differences:

  1. When using 'forkserver' method to start a process, it will load all the resources including import files, which means the process will try to find the pyc files, otherwise it will compile the python file to pyc files and then load them.

  2. The processes in a Pool will never release, which means once a process load pyc files into its memory, it will never load again.

  3. The Mac computer has SSD hard disk, which means if a process on Mac try to load pyc files, it will get better performance than the process on a computer which do not have SSD hard disk.

Now the question I'm running into is whether there are ways to pre-load resources for processes started with 'forkserver' method for better performance.



来源:https://stackoverflow.com/questions/65805160/the-different-performance-of-python-multiprocessing-pool-on-macos-and-linux-syst

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!