Nested parallelism in Python

前端 未结 2 907
失恋的感觉
失恋的感觉 2020-12-30 02:43

I am trying out multiprocessor programming with Python. Take a divide and conquer algorithm like Fibonacci for example. The program flow of execution would bran

2条回答
  •  一生所求
    2020-12-30 02:52

    1) What am I missing here; why shouldn't a Pool be shared between processes?

    Not all object/instances are pickable/serializable, in this case, pool uses threading.lock which is not pickable:

    >>> import threading, pickle
    >>> pickle.dumps(threading.Lock())
    Traceback (most recent call last):
      File "", line 1, in 
    [...]
      File "/Users/rafael/dev/venvs/general/bin/../lib/python2.7/copy_reg.py", line 70, in _reduce_ex
        raise TypeError, "can't pickle %s objects" % base.__name__
    TypeError: can't pickle lock objects
    

    or better:

    >>> import threading, pickle
    >>> from concurrent.futures import ThreadPoolExecutor
    >>> pickle.dumps(ThreadPoolExecutor(1))
    Traceback (most recent call last):
      File "", line 1, in 
      File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1374, in dumps
        Pickler(file, protocol).dump(obj)
      File 
    [...]
    "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 306, in save
            rv = reduce(self.proto)
          File "/Users/rafael/dev/venvs/general/bin/../lib/python2.7/copy_reg.py", line 70, in _reduce_ex
            raise TypeError, "can't pickle %s objects" % base.__name__
        TypeError: can't pickle lock objects
    

    If you think about it, it makes sense, a lock is a semaphore primitive managed by the operating system (since python uses native threads). Being able to pickle and save that object state inside the python runtime would really not accomplish anything meaningful since its true state is being kept by the OS.

    2) What is a pattern for implementing nested parallelism in Python? If possible, maintaining a recursive structure, and not trading it for iteration

    Now, for the prestige, everything I mentioned above doesn't really apply to your example since you are using threads (ThreadPoolExecutor) and not processes (ProcessPoolExecutor) so no data sharing across process has to happen.

    Your java example just appears to be more efficient since the thread pool you are using (CachedThreadPool) is creating new threads as needed whereas the python executor implementations are bounded and require a explicit max thread count (max_workers). There's a little bit of syntax differences between the languages that also seems to be throwing you off (static instances in python are essentially anything not explicitly scoped) but essentially both examples would created exactly the same number of threads in order to execute. For instance, here's an example using a fairly naive CachedThreadPoolExecutor implementation in python:

    from concurrent.futures import ThreadPoolExecutor
    
    class CachedThreadPoolExecutor(ThreadPoolExecutor):
        def __init__(self):
            super(CachedThreadPoolExecutor, self).__init__(max_workers=1)
    
        def submit(self, fn, *args, **extra):
            if self._work_queue.qsize() > 0:
                print('increasing pool size from %d to %d' % (self._max_workers, self._max_workers+1))
                self._max_workers +=1
    
            return super(CachedThreadPoolExecutor, self).submit(fn, *args, **extra)
    
    pool = CachedThreadPoolExecutor()
    
    def fibonacci(n):
        print n
        if n < 2:
            return n
        a = pool.submit(fibonacci, n - 1)
        b = pool.submit(fibonacci, n - 2)
        return a.result() + b.result()
    
    print(fibonacci(10))
    

    Performance tuning:

    I strongly suggest looking into gevent since it will give you high concurrency without the thread overhead. This is not always the case but your code is actually the poster child for gevent usage. Here's an example:

    import gevent
    
    def fibonacci(n):
        print n
        if n < 2:
            return n
        a = gevent.spawn(fibonacci, n - 1)
        b = gevent.spawn(fibonacci, n - 2)
        return a.get()  + b.get()
    
    print(fibonacci(10))
    

    Completely unscientific but on my computer the code above runs 9x faster than its threaded equivalent.

    I hope this helps.

提交回复
热议问题