What is the global interpreter lock (GIL) in CPython?

前端 未结 8 1922
悲哀的现实
悲哀的现实 2020-11-21 08:53

What is a global interpreter lock and why is it an issue?

A lot of noise has been made around removing the GIL from Python, and I\'d like to understand why that is s

相关标签:
8条回答
  • 2020-11-21 09:29

    Python 3.7 documentation

    I would also like to highlight the following quote from the Python threading documentation:

    CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing or concurrent.futures.ProcessPoolExecutor. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.

    This links to the Glossary entry for global interpreter lock which explains that the GIL implies that threaded parallelism in Python is unsuitable for CPU bound tasks:

    The mechanism used by the CPython interpreter to assure that only one thread executes Python bytecode at a time. This simplifies the CPython implementation by making the object model (including critical built-in types such as dict) implicitly safe against concurrent access. Locking the entire interpreter makes it easier for the interpreter to be multi-threaded, at the expense of much of the parallelism afforded by multi-processor machines.

    However, some extension modules, either standard or third-party, are designed so as to release the GIL when doing computationally-intensive tasks such as compression or hashing. Also, the GIL is always released when doing I/O.

    Past efforts to create a “free-threaded” interpreter (one which locks shared data at a much finer granularity) have not been successful because performance suffered in the common single-processor case. It is believed that overcoming this performance issue would make the implementation much more complicated and therefore costlier to maintain.

    This quote also implies that dicts and thus variable assignment are also thread safe as a CPython implementation detail:

    • Is Python variable assignment atomic?
    • Thread Safety in Python's dictionary

    Next, the docs for the multiprocessing package explain how it overcomes the GIL by spawning process while exposing an interface similar to that of threading:

    multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.

    And the docs for concurrent.futures.ProcessPoolExecutor explain that it uses multiprocessing as a backend:

    The ProcessPoolExecutor class is an Executor subclass that uses a pool of processes to execute calls asynchronously. ProcessPoolExecutor uses the multiprocessing module, which allows it to side-step the Global Interpreter Lock but also means that only picklable objects can be executed and returned.

    which should be contrasted to the other base class ThreadPoolExecutor that uses threads instead of processes

    ThreadPoolExecutor is an Executor subclass that uses a pool of threads to execute calls asynchronously.

    from which we conclude that ThreadPoolExecutor is only suitable for I/O bound tasks, while ProcessPoolExecutor can also handle CPU bound tasks.

    The following question asks why the GIL exists in the first place: Why the Global Interpreter Lock?

    Process vs thread experiments

    At Multiprocessing vs Threading Python I've done an experimental analysis of process vs threads in Python.

    Quick preview of the results:

    0 讨论(0)
  • 2020-11-21 09:35

    Why Python (CPython and others) uses the GIL

    From http://wiki.python.org/moin/GlobalInterpreterLock

    In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe.

    How to remove it from Python?

    Like Lua, maybe Python could start multiple VM, But python doesn't do that, I guess there should be some other reasons.

    In Numpy or some other python extended library, sometimes, releasing the GIL to other threads could boost the efficiency of the whole programme.

    0 讨论(0)
  • 2020-11-21 09:37

    Python's GIL is intended to serialize access to interpreter internals from different threads. On multi-core systems, it means that multiple threads can't effectively make use of multiple cores. (If the GIL didn't lead to this problem, most people wouldn't care about the GIL - it's only being raised as an issue because of the increasing prevalence of multi-core systems.) If you want to understand it in detail, you can view this video or look at this set of slides. It might be too much information, but then you did ask for details :-)

    Note that Python's GIL is only really an issue for CPython, the reference implementation. Jython and IronPython don't have a GIL. As a Python developer, you don't generally come across the GIL unless you're writing a C extension. C extension writers need to release the GIL when their extensions do blocking I/O, so that other threads in the Python process get a chance to run.

    0 讨论(0)
  • 2020-11-21 09:39

    Suppose you have multiple threads which don't really touch each other's data. Those should execute as independently as possible. If you have a "global lock" which you need to acquire in order to (say) call a function, that can end up as a bottleneck. You can wind up not getting much benefit from having multiple threads in the first place.

    To put it into a real world analogy: imagine 100 developers working at a company with only a single coffee mug. Most of the developers would spend their time waiting for coffee instead of coding.

    None of this is Python-specific - I don't know the details of what Python needed a GIL for in the first place. However, hopefully it's given you a better idea of the general concept.

    0 讨论(0)
  • 2020-11-21 09:41

    Python doesn't allow multi-threading in the truest sense of the word. It has a multi-threading package but if you want to multi-thread to speed your code up, then it's usually not a good idea to use it. Python has a construct called the Global Interpreter Lock (GIL).

    https://www.youtube.com/watch?v=ph374fJqFPE

    The GIL makes sure that only one of your 'threads' can execute at any one time. A thread acquires the GIL, does a little work, then passes the GIL onto the next thread. This happens very quickly so to the human eye it may seem like your threads are executing in parallel, but they are really just taking turns using the same CPU core. All this GIL passing adds overhead to execution. This means that if you want to make your code run faster then using the threading package often isn't a good idea.

    There are reasons to use Python's threading package. If you want to run some things simultaneously, and efficiency is not a concern, then it's totally fine and convenient. Or if you are running code that needs to wait for something (like some IO) then it could make a lot of sense. But the threading library wont let you use extra CPU cores.

    Multi-threading can be outsourced to the operating system (by doing multi-processing), some external application that calls your Python code (eg, Spark or Hadoop), or some code that your Python code calls (eg: you could have your Python code call a C function that does the expensive multi-threaded stuff).

    0 讨论(0)
  • 2020-11-21 09:41

    I want to share an example from the book multithreading for Visual Effects. So here is a classic dead lock situation

    static void MyCallback(const Context &context){
    Auto<Lock> lock(GetMyMutexFromContext(context));
    ...
    EvalMyPythonString(str); //A function that takes the GIL
    ...    
    }
    

    Now consider the events in the sequence resulting a dead-lock.

    ╔═══╦════════════════════════════════════════╦══════════════════════════════════════╗
    ║   ║ Main Thread                            ║ Other Thread                         ║
    ╠═══╬════════════════════════════════════════╬══════════════════════════════════════╣
    ║ 1 ║ Python Command acquires GIL            ║ Work started                         ║
    ║ 2 ║ Computation requested                  ║ MyCallback runs and acquires MyMutex ║
    ║ 3 ║                                        ║ MyCallback now waits for GIL         ║
    ║ 4 ║ MyCallback runs and waits for MyMutex  ║ waiting for GIL                      ║
    ╚═══╩════════════════════════════════════════╩══════════════════════════════════════╝
    
    0 讨论(0)
提交回复
热议问题