Why does this Python script run 4x slower on multiple cores than on a single core

别说谁变了你拦得住时间么 提交于 2019-11-29 07:03:48

It's due to GIL thrashing when multiple native threads are competing for the GIL. David Beazley's materials on this subject will tell your everything you want to know.

See info here for a nice graphical representation of what is happening.

Python3.2 introduced changes to the GIL that help solve this problem so you should see improved performance with 3.2 and later.

It should also be noted that the GIL is an implementation detail of the cpython reference implementation of the language. Other implementations like Jython do not have GIL and do not suffer this particular problem.

The rest of D. Beazley's info on the GIL will also be helpful to you.

To specifically answer your question about why performance is so much worse when multiple cores are involved, see slide 29-41 of the Inside the GIL presentation. It goes into a detailed discussion on multicore GIL contention as opposed to multiple threads on a single core. Slide 32 specifically shows that the number of system calls due to thread signaling overhead goes through the roof as you add cores. This is because the threads are now running simulatneously on different cores and which allows them to engage in a true GIL battle. As opposed to multiple threads sharing a single CPU. A good summary bullet from the above presentation is:

With multiple cores, CPU-bound threads get scheduled simultaneously (on different cores) and then have a GIL battle.

The GIL prevents several python threads to run concurrently. That means whenever one thread needs to execute Python bytecode (the internal representation of the code), it will acquire the lock (effectively stopping the other threads on the other cores). For this to work, the CPU needs to flush all cache lines. Otherwise, the active thread would operate on stale data.

When you run the threads on a single CPU, no cache flush is necessary.

This should account for most of the slowdown. If you want to run Python code in parallel, you need to use processes and IPC (sockets, semaphores, memory mapped IO). But that can be slow for different reasons (memory must be copied between processes).

Another approach is move more code in a C library which doesn't hold the GIL while it executes. That would allow to execute more code in parallel.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!