multiprocessing.Pool: When to use apply, apply_async or map?

后端 未结 3 2128
遥遥无期
遥遥无期 2020-11-22 15:45

I have not seen clear examples with use-cases for Pool.apply, Pool.apply_async and Pool.map. I am mainly using Pool.map; what are the advantages of others?

相关标签:
3条回答
  • 2020-11-22 16:10

    Here is an overview in a table format in order to show the differences between Pool.apply, Pool.apply_async, Pool.map and Pool.map_async. When choosing one, you have to take multi-args, concurrency, blocking, and ordering into account:

                      | Multi-args   Concurrence    Blocking     Ordered-results
    ---------------------------------------------------------------------
    Pool.map          | no           yes            yes          yes
    Pool.map_async    | no           yes            no           yes
    Pool.apply        | yes          no             yes          no
    Pool.apply_async  | yes          yes            no           no
    Pool.starmap      | yes          yes            yes          yes
    Pool.starmap_async| yes          yes            no           no
    

    Notes:

    • Pool.imap and Pool.imap_async – lazier version of map and map_async.

    • Pool.starmap method, very much similar to map method besides it acceptance of multiple arguments.

    • Async methods submit all the processes at once and retrieve the results once they are finished. Use get method to obtain the results.

    • Pool.map(or Pool.apply)methods are very much similar to Python built-in map(or apply). They block the main process until all the processes complete and return the result.

    Examples:

    map

    Is called for a list of jobs in one time

    results = pool.map(func, [1, 2, 3])
    

    apply

    Can only be called for one job

    for x, y in [[1, 1], [2, 2]]:
        results.append(pool.apply(func, (x, y)))
    
    def collect_result(result):
        results.append(result)
    

    map_async

    Is called for a list of jobs in one time

    pool.map_async(func, jobs, callback=collect_result)
    

    apply_async

    Can only be called for one job and executes a job in the background in parallel

    for x, y in [[1, 1], [2, 2]]:
        pool.apply_async(worker, (x, y), callback=collect_result)
    

    starmap

    Is a variant of pool.map which support multiple arguments

    pool.starmap(func, [(1, 1), (2, 1), (3, 1)])
    

    starmap_async

    A combination of starmap() and map_async() that iterates over iterable of iterables and calls func with the iterables unpacked. Returns a result object.

    pool.starmap_async(calculate_worker, [(1, 1), (2, 1), (3, 1)], callback=collect_result)
    

    Reference:

    Find complete documentation here: https://docs.python.org/3/library/multiprocessing.html

    0 讨论(0)
  • 2020-11-22 16:16

    Regarding apply vs map:

    pool.apply(f, args): f is only executed in ONE of the workers of the pool. So ONE of the processes in the pool will run f(args).

    pool.map(f, iterable): This method chops the iterable into a number of chunks which it submits to the process pool as separate tasks. So you take advantage of all the processes in the pool.

    0 讨论(0)
  • 2020-11-22 16:23

    Back in the old days of Python, to call a function with arbitrary arguments, you would use apply:

    apply(f,args,kwargs)
    

    apply still exists in Python2.7 though not in Python3, and is generally not used anymore. Nowadays,

    f(*args,**kwargs)
    

    is preferred. The multiprocessing.Pool modules tries to provide a similar interface.

    Pool.apply is like Python apply, except that the function call is performed in a separate process. Pool.apply blocks until the function is completed.

    Pool.apply_async is also like Python's built-in apply, except that the call returns immediately instead of waiting for the result. An AsyncResult object is returned. You call its get() method to retrieve the result of the function call. The get() method blocks until the function is completed. Thus, pool.apply(func, args, kwargs) is equivalent to pool.apply_async(func, args, kwargs).get().

    In contrast to Pool.apply, the Pool.apply_async method also has a callback which, if supplied, is called when the function is complete. This can be used instead of calling get().

    For example:

    import multiprocessing as mp
    import time
    
    def foo_pool(x):
        time.sleep(2)
        return x*x
    
    result_list = []
    def log_result(result):
        # This is called whenever foo_pool(i) returns a result.
        # result_list is modified only by the main process, not the pool workers.
        result_list.append(result)
    
    def apply_async_with_callback():
        pool = mp.Pool()
        for i in range(10):
            pool.apply_async(foo_pool, args = (i, ), callback = log_result)
        pool.close()
        pool.join()
        print(result_list)
    
    if __name__ == '__main__':
        apply_async_with_callback()
    

    may yield a result such as

    [1, 0, 4, 9, 25, 16, 49, 36, 81, 64]
    

    Notice, unlike pool.map, the order of the results may not correspond to the order in which the pool.apply_async calls were made.


    So, if you need to run a function in a separate process, but want the current process to block until that function returns, use Pool.apply. Like Pool.apply, Pool.map blocks until the complete result is returned.

    If you want the Pool of worker processes to perform many function calls asynchronously, use Pool.apply_async. The order of the results is not guaranteed to be the same as the order of the calls to Pool.apply_async.

    Notice also that you could call a number of different functions with Pool.apply_async (not all calls need to use the same function).

    In contrast, Pool.map applies the same function to many arguments. However, unlike Pool.apply_async, the results are returned in an order corresponding to the order of the arguments.

    0 讨论(0)
提交回复
热议问题