400 threads in 20 processes outperform 400 threads in 4 processes while performing an I/O-bound task

三世轮回 提交于 2019-11-28 10:27:07

Your task is I/O-bound rather than CPU-bound: threads spend most of the time in sleep state waiting for network data and such rather than using the CPU.

So adding more threads than CPUs works here as long as I/O is still the bottleneck. The effect will only subside once there are so many threads that enough of them are ready at a time to start actively competing for CPU cycles (or when your network bandwidth is exhausted, whichever comes first).


As for why 20 threads per process is faster than 100 threads per process: this is most likely due to CPython's GIL. Python threads in the same process need to wait not only for I/O but for each other, too.
When dealing with I/O, Python machinery:

  1. Converts all Python objects involved into C objects (in many cases, this can be done without physically copying the data)
  2. Releases the GIL
  3. Perform the I/O in C (which involves waiting for it for arbitrary time)
  4. Reacquires the GIL
  5. Converts the result to a Python object if applicable

If there are enough threads in the same process, it becomes increasigly likely that another one is active when step 4 is reached, causing an additional random delay.


Now, when it comes to lots of processes, other factors come into play like memory swapping (since unlike threads, processes running the same code don't share memory) (I'm pretty sure there are other delays from lots of processes as opposed to threads competing for resources but can't point it from the top of my head). That's why the performance becomes unstable.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!