Python Multiprocessing Locks

后端 未结 3 1684
暗喜
暗喜 2021-02-04 00:30

This multiprocessing code works as expected. It creates 4 Python processes, and uses them to print the numbers 0 through 39, with a delay after each print.

impor         


        
相关标签:
3条回答
  • 2021-02-04 00:50

    If you change pool.apply_async to pool.apply, you get this exception:

    Traceback (most recent call last):
      File "p.py", line 15, in <module>
        pool.apply(job, [l, i])
      File "/usr/lib/python2.7/multiprocessing/pool.py", line 244, in apply
        return self.apply_async(func, args, kwds).get()
      File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
        raise self._value
    RuntimeError: Lock objects should only be shared between processes through inheritance
    

    pool.apply_async is just hiding it. I hate to say this, but using a global variable is probably the simplest way for your example. Let's just hope the velociraptors don't get you.

    0 讨论(0)
  • 2021-02-04 00:56

    I think the reason is that the multiprocessing pool uses pickle to transfer objects between the processes. However, a Lock cannot be pickled:

    >>> import multiprocessing
    >>> import pickle
    >>> lock = multiprocessing.Lock()
    >>> lp = pickle.dumps(lock)
    Traceback (most recent call last):
      File "<pyshell#3>", line 1, in <module>
        lp = pickle.dumps(lock)
    ...
    RuntimeError: Lock objects should only be shared between processes through inheritance
    >>> 
    

    See the "Picklability" and "Better to inherit than pickle/unpickle" sections of https://docs.python.org/2/library/multiprocessing.html#all-platforms

    0 讨论(0)
  • 2021-02-04 01:00

    Other answers already provide the answer that the apply_async silently fails unless an appropriate error_callback argument is provided. I still found OP's other point valid -- the official docs do indeed show multiprocessing.Lock being passed around as a function argument. In fact, the sub-section titled "Explicitly pass resources to child processes" in Programming guidelines recommends passing a multiprocessing.Lock object as function argument instead of a global variable. And, I have been writing a lot of code in which I pass a multiprocessing.Lock as an argument to the child process and it all works as expected.

    So, what gives?

    I first investigated whether multiprocessing.Lock is pickle-able or not. In Python 3, MacOS+CPython, trying to pickle multiprocessing.Lock produces the familiar RuntimeError encountered by others.

    >>> pickle.dumps(multiprocessing.Lock())
    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    <ipython-input-7-66dfe1355652> in <module>
    ----> 1 pickle.dumps(multiprocessing.Lock())
    
    /usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/synchronize.py in __getstate__(self)
         99
        100     def __getstate__(self):
    --> 101         context.assert_spawning(self)
        102         sl = self._semlock
        103         if sys.platform == 'win32':
    
    /usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/context.py in assert_spawning(obj)
        354         raise RuntimeError(
        355             '%s objects should only be shared between processes'
    --> 356             ' through inheritance' % type(obj).__name__
        357             )
    
    RuntimeError: Lock objects should only be shared between processes through inheritance
    

    To me, this confirms that multiprocessing.Lock is indeed not pickle-able.

    Aside begins

    But, the same lock still needs to be shared across two or more python processes which will have their own, potentially different address spaces (such as when we use "spawn" or "forkserver" as start methods). multiprocessing must be doing something special to send Lock across processes. This other StackOverflow post seems to indicate that in Unix systems, multiprocessing.Lock may be implemented via named semaphores that are supported by the OS itself (outside python). Two or more python processes can then link to the same lock that effectively resides in one location outside both python processes. There may be a shared memory implementation as well.

    Aside ends

    Can we pass multiprocessing.Lock object as an argument or not?

    After a few more experiments and more reading, it appears that the difference is between multiprocessing.Pool and multiprocessing.Process.

    multiprocessing.Process lets you pass multiprocessing.Lock as an argument but multiprocessing.Pool doesn't. Here is an example that works:

    import multiprocessing
    import time
    from multiprocessing import Process, Lock
    
    
    def task(n: int, lock):
        with lock:
            print(f'n={n}')
        time.sleep(0.25)
    
    
    if __name__ == '__main__':
        multiprocessing.set_start_method('forkserver')
        lock = Lock()
        processes = [Process(target=task, args=(i, lock)) for i in range(20)]
        for process in processes:
            process.start()
        for process in processes:
            process.join()
    

    Note the use of __name__ == '__main__' is essential as mentioned in the "Safe importing of main module" sub-section of Programming guidelines.

    multiprocessing.Pool seems to use queue.SimpleQueue which puts each task in a queue and that's where pickling happens. Most likely, multiprocessing.Process is not using pickling (or doing a special version of pickling).

    0 讨论(0)
提交回复
热议问题