Parfor for Python

前端 未结 8 1334
死守一世寂寞
死守一世寂寞 2020-12-07 17:46

I am looking for a definitive answer to MATLAB\'s parfor for Python (Scipy, Numpy).

Is there a solution similar to parfor? If not, what is the complication for creat

相关标签:
8条回答
  • 2020-12-07 17:56

    I've always used Parallel Python but it's not a complete analog since I believe it typically uses separate processes which can be expensive on certain operating systems. Still, if the body of your loops are chunky enough then this won't matter and can actually have some benefits.

    0 讨论(0)
  • 2020-12-07 18:03

    There are many Python frameworks for parallel computing. The one I happen to like most is IPython, but I don't know too much about any of the others. In IPython, one analogue to parfor would be client.MultiEngineClient.map() or some of the other constructs in the documentation on quick and easy parallelism.

    0 讨论(0)
  • 2020-12-07 18:03

    Ok, I'll also give it a go, let's see if my way is easier

    from multiprocessing import Pool
    def heavy_func(key):
        #do some heavy computation on each key 
        output = key**2
        return key, output 
    
    output_data ={}     #<--this dict will store the results
    keys = [1,5,7,8,10] #<--compute heavy_func over all the values of keys
    with Pool(processes=40) as pool:
        for i in pool.imap_unordered(heavy_func, keys):
            output_data[i[0]] = i[1]
    
    

    Now output_data is a dictionary that will contain for every key the result of the computation on this key.

    That is it..

    0 讨论(0)
  • 2020-12-07 18:08

    This can be done elegantly with Ray, a system that allows you to easily parallelize and distribute your Python code.

    To parallelize your example, you'd need to define your functions with the @ray.remote decorator, and then invoke them with .remote.

    import numpy as np
    import time
    
    import ray
    
    ray.init()
    
    # Define the function. Each remote function will be executed 
    # in a separate process.
    @ray.remote
    def HeavyComputationThatIsThreadSafe(i, j):
        n = i*j
        time.sleep(0.5) # Simulate some heavy computation. 
        return n
    
    N = 10
    output_ids = []
    for i in range(N):
        for j in range(N):
            # Remote functions return a future, i.e, an identifier to the 
            # result, rather than the result itself. This allows invoking
            # the next remote function before the previous finished, which
            # leads to the remote functions being executed in parallel.
            output_ids.append(HeavyComputationThatIsThreadSafe.remote(i,j))
    
    # Get results when ready.
    output_list = ray.get(output_ids)
    # Move results into an NxN numpy array.
    outputs = np.array(output_list).reshape(N, N)
    
    # This program should take approximately N*N*0.5s/p, where
    # p is the number of cores on your machine, N*N
    # is the number of times we invoke the remote function,
    # and 0.5s is the time it takes to execute one instance
    # of the remote function. For example, for two cores this
    # program will take approximately 25sec. 
    

    There are a number of advantages of using Ray over the multiprocessing module. In particular, the same code will run on a single machine as well as on a cluster of machines. For more advantages of Ray see this related post.

    Note: One point to keep in mind is that each remote function is executed in a separate process, possibly on a different machine, and thus the remote function's computation should take more than invoking a remote function. As a rule of thumb a remote function's computation should take at least a few 10s of msec to amortize the scheduling and startup overhead of a remote function.

    0 讨论(0)
  • 2020-12-07 18:10

    Jupyter Notebook

    To see an example consider you want to write the equivalence of this Matlab code on in Python

    matlabpool open 4
    parfor n=0:9
       for i=1:10000
           for j=1:10000
               s=j*i   
           end
       end
       n
    end
    disp('done')
    

    The way one may write this in python particularly in jupyter notebook. You have to create a function in the working directory (I called it FunForParFor.py) which has the following

    def func(n):
        for i in range(10000):
            for j in range(10000):
                s=j*i
        print(n)
    

    Then I go to my Jupyter notebook and write the following code

    import multiprocessing  
    import FunForParFor
    
    if __name__ == '__main__':
        pool = multiprocessing.Pool(processes=4)
        pool.map(FunForParFor.func, range(10))
        pool.close()
        pool.join()   
        print('done')
    

    This has worked for me! I just wanted to share it here to give you a particular example.

    0 讨论(0)
  • 2020-12-07 18:11

    The one built-in to python would be multiprocessing docs are here. I always use multiprocessing.Pool with as many workers as processors. Then whenever I need to do a for-loop like structure I use Pool.imap

    As long as the body of your function does not depend on any previous iteration then you should have near linear speed-up. This also requires that your inputs and outputs are pickle-able but this is pretty easy to ensure for standard types.

    UPDATE: Some code for your updated function just to show how easy it is:

    from multiprocessing import Pool
    from itertools import product
    
    output = np.zeros((N,N))
    pool = Pool() #defaults to number of available CPU's
    chunksize = 20 #this may take some guessing ... take a look at the docs to decide
    for ind, res in enumerate(pool.imap(Fun, product(xrange(N), xrange(N))), chunksize):
        output.flat[ind] = res
    
    0 讨论(0)
提交回复
热议问题