Parallelizing a Numpy vector operation

后端 未结 3 1865
别那么骄傲
别那么骄傲 2020-11-28 02:34

Let\'s use, for example, numpy.sin()

The following code will return the value of the sine for each value of the array a:

im         


        
相关标签:
3条回答
  • 2020-11-28 02:36

    SciPy actually has a pretty good writeup on this subject here: http://wiki.scipy.org/ParallelProgramming

    Edit: dead link, can now be found at: http://scipy-cookbook.readthedocs.io/items/ParallelProgramming.html

    0 讨论(0)
  • 2020-11-28 02:49

    Well this is kind of interesting note if you run the following commands:

    import numpy
    from multiprocessing import Pool
    a = numpy.arange(1000000)    
    pool = Pool(processes = 5)
    result = pool.map(numpy.sin, a)
    
    UnpicklingError: NEWOBJ class argument has NULL tp_new
    

    wasn't expecting that, so whats going on, well:

    >>> help(numpy.sin)
       Help on ufunc object:
    
    sin = class ufunc(__builtin__.object)
     |  Functions that operate element by element on whole arrays.
     |  
     |  To see the documentation for a specific ufunc, use np.info().  For
     |  example, np.info(np.sin).  Because ufuncs are written in C
     |  (for speed) and linked into Python with NumPy's ufunc facility,
     |  Python's help() function finds this page whenever help() is called
     |  on a ufunc.
    

    yep numpy.sin is implemented in c as such you can't really use it directly with multiprocessing.

    so we have to wrap it with another function

    perf:

    import time
    import numpy
    from multiprocessing import Pool
    
    def numpy_sin(value):
        return numpy.sin(value)
    
    a = numpy.arange(1000000)
    pool = Pool(processes = 5)
    
    start = time.time()
    result = numpy.sin(a)
    end = time.time()
    print 'Singled threaded %f' % (end - start)
    start = time.time()
    result = pool.map(numpy_sin, a)
    pool.close()
    pool.join()
    end = time.time()
    print 'Multithreaded %f' % (end - start)
    
    
    $ python perf.py 
    Singled threaded 0.032201
    Multithreaded 10.550432
    

    wow, wasn't expecting that either, well theres a couple of issues for starters we are using a python function even if its just a wrapper vs a pure c function, and theres also the overhead of copying the values, multiprocessing by default doesn't share data, as such each value needs to be copy back/forth.

    do note that if properly segment our data:

    import time
    import numpy
    from multiprocessing import Pool
    
    def numpy_sin(value):
        return numpy.sin(value)
    
    a = [numpy.arange(100000) for _ in xrange(10)]
    pool = Pool(processes = 5)
    
    start = time.time()
    result = numpy.sin(a)
    end = time.time()
    print 'Singled threaded %f' % (end - start)
    start = time.time()
    result = pool.map(numpy_sin, a)
    pool.close()
    pool.join()
    end = time.time()
    print 'Multithreaded %f' % (end - start)
    
    $ python perf.py 
    Singled threaded 0.150192
    Multithreaded 0.055083
    

    So what can we take from this, multiprocessing is great but we should always test and compare it sometimes its faster and sometimes its slower, depending how its used ...

    Granted you are not using numpy.sin but another function I would recommend you first verify that indeed multiprocessing will speed up the computation, maybe the overhead of copying values back/forth may affect you.

    Either way I also do believe that using pool.map is the best, safest method of multithreading code ...

    I hope this helps.

    0 讨论(0)
  • 2020-11-28 02:59

    There is a better way: numexpr

    Slightly reworded from their main page:

    It's a multi-threaded VM written in C that analyzes expressions, rewrites them more efficiently, and compiles them on the fly into code that gets near optimal parallel performance for both memory and cpu bounded operations.

    For example, in my 4 core machine, evaluating a sine is just slightly less than 4 times faster than numpy.

    In [1]: import numpy as np
    In [2]: import numexpr as ne
    In [3]: a = np.arange(1000000)
    In [4]: timeit ne.evaluate('sin(a)')
    100 loops, best of 3: 15.6 ms per loop    
    In [5]: timeit np.sin(a)
    10 loops, best of 3: 54 ms per loop
    

    Documentation, including supported functions here. You'll have to check or give us more information to see if your more complicated function can be evaluated by numexpr.

    0 讨论(0)
提交回复
热议问题