How to use multiprocessing pool.map with multiple arguments?

前端 未结 20 3436
-上瘾入骨i
-上瘾入骨i 2020-11-21 11:24

In the Python multiprocessing library, is there a variant of pool.map which supports multiple arguments?

text = "test"
def         


        
相关标签:
20条回答
  • 2020-11-21 11:48

    The answer to this is version- and situation-dependent. The most general answer for recent versions of Python (since 3.3) was first described below by J.F. Sebastian.1 It uses the Pool.starmap method, which accepts a sequence of argument tuples. It then automatically unpacks the arguments from each tuple and passes them to the given function:

    import multiprocessing
    from itertools import product
    
    def merge_names(a, b):
        return '{} & {}'.format(a, b)
    
    if __name__ == '__main__':
        names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']
        with multiprocessing.Pool(processes=3) as pool:
            results = pool.starmap(merge_names, product(names, repeat=2))
        print(results)
    
    # Output: ['Brown & Brown', 'Brown & Wilson', 'Brown & Bartlett', ...
    

    For earlier versions of Python, you'll need to write a helper function to unpack the arguments explicitly. If you want to use with, you'll also need to write a wrapper to turn Pool into a context manager. (Thanks to muon for pointing this out.)

    import multiprocessing
    from itertools import product
    from contextlib import contextmanager
    
    def merge_names(a, b):
        return '{} & {}'.format(a, b)
    
    def merge_names_unpack(args):
        return merge_names(*args)
    
    @contextmanager
    def poolcontext(*args, **kwargs):
        pool = multiprocessing.Pool(*args, **kwargs)
        yield pool
        pool.terminate()
    
    if __name__ == '__main__':
        names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']
        with poolcontext(processes=3) as pool:
            results = pool.map(merge_names_unpack, product(names, repeat=2))
        print(results)
    
    # Output: ['Brown & Brown', 'Brown & Wilson', 'Brown & Bartlett', ...
    

    In simpler cases, with a fixed second argument, you can also use partial, but only in Python 2.7+.

    import multiprocessing
    from functools import partial
    from contextlib import contextmanager
    
    @contextmanager
    def poolcontext(*args, **kwargs):
        pool = multiprocessing.Pool(*args, **kwargs)
        yield pool
        pool.terminate()
    
    def merge_names(a, b):
        return '{} & {}'.format(a, b)
    
    if __name__ == '__main__':
        names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']
        with poolcontext(processes=3) as pool:
            results = pool.map(partial(merge_names, b='Sons'), names)
        print(results)
    
    # Output: ['Brown & Sons', 'Wilson & Sons', 'Bartlett & Sons', ...
    

    1. Much of this was inspired by his answer, which should probably have been accepted instead. But since this one is stuck at the top, it seemed best to improve it for future readers.

    0 讨论(0)
  • 2020-11-21 11:48

    There's a fork of multiprocessing called pathos (note: use the version on github) that doesn't need starmap -- the map functions mirror the API for python's map, thus map can take multiple arguments. With pathos, you can also generally do multiprocessing in the interpreter, instead of being stuck in the __main__ block. Pathos is due for a release, after some mild updating -- mostly conversion to python 3.x.

      Python 2.7.5 (default, Sep 30 2013, 20:15:49) 
      [GCC 4.2.1 (Apple Inc. build 5566)] on darwin
      Type "help", "copyright", "credits" or "license" for more information.
      >>> def func(a,b):
      ...     print a,b
      ...
      >>>
      >>> from pathos.multiprocessing import ProcessingPool    
      >>> pool = ProcessingPool(nodes=4)
      >>> pool.map(func, [1,2,3], [1,1,1])
      1 1
      2 1
      3 1
      [None, None, None]
      >>>
      >>> # also can pickle stuff like lambdas 
      >>> result = pool.map(lambda x: x**2, range(10))
      >>> result
      [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
      >>>
      >>> # also does asynchronous map
      >>> result = pool.amap(pow, [1,2,3], [4,5,6])
      >>> result.get()
      [1, 32, 729]
      >>>
      >>> # or can return a map iterator
      >>> result = pool.imap(pow, [1,2,3], [4,5,6])
      >>> result
      <processing.pool.IMapIterator object at 0x110c2ffd0>
      >>> list(result)
      [1, 32, 729]
    

    pathos has several ways that that you can get the exact behavior of starmap.

    >>> def add(*x):
    ...   return sum(x)
    ... 
    >>> x = [[1,2,3],[4,5,6]]
    >>> import pathos
    >>> import numpy as np
    >>> # use ProcessPool's map and transposing the inputs
    >>> pp = pathos.pools.ProcessPool()
    >>> pp.map(add, *np.array(x).T)
    [6, 15]
    >>> # use ProcessPool's map and a lambda to apply the star
    >>> pp.map(lambda x: add(*x), x)
    [6, 15]
    >>> # use a _ProcessPool, which has starmap
    >>> _pp = pathos.pools._ProcessPool()
    >>> _pp.starmap(add, x)
    [6, 15]
    >>> 
    
    0 讨论(0)
  • 2020-11-21 11:48

    Another way is to pass a list of lists to a one-argument routine:

    import os
    from multiprocessing import Pool
    
    def task(args):
        print "PID =", os.getpid(), ", arg1 =", args[0], ", arg2 =", args[1]
    
    pool = Pool()
    
    pool.map(task, [
            [1,2],
            [3,4],
            [5,6],
            [7,8]
        ])
    

    One can than construct a list lists of arguments with one's favorite method.

    0 讨论(0)
  • 2020-11-21 11:48

    This is an example of the routine I use to pass multiple arguments to a one-argument function used in a pool.imap fork:

    from multiprocessing import Pool
    
    # Wrapper of the function to map:
    class makefun:
        def __init__(self, var2):
            self.var2 = var2
        def fun(self, i):
            var2 = self.var2
            return var1[i] + var2
    
    # Couple of variables for the example:
    var1 = [1, 2, 3, 5, 6, 7, 8]
    var2 = [9, 10, 11, 12]
    
    # Open the pool:
    pool = Pool(processes=2)
    
    # Wrapper loop
    for j in range(len(var2)):
        # Obtain the function to map
        pool_fun = makefun(var2[j]).fun
    
        # Fork loop
        for i, value in enumerate(pool.imap(pool_fun, range(len(var1))), 0):
            print(var1[i], '+' ,var2[j], '=', value)
    
    # Close the pool
    pool.close()
    
    0 讨论(0)
  • 2020-11-21 11:49
    text = "test"
    
    def unpack(args):
        return args[0](*args[1:])
    
    def harvester(text, case):
        X = case[0]
        text+ str(X)
    
    if __name__ == '__main__':
        pool = multiprocessing.Pool(processes=6)
        case = RAW_DATASET
        # args is a list of tuples 
        # with the function to execute as the first item in each tuple
        args = [(harvester, text, c) for c in case]
        # doing it this way, we can pass any function
        # and we don't need to define a wrapper for each different function
        # if we need to use more than one
        pool.map(unpack, args)
        pool.close()
        pool.join()
    
    0 讨论(0)
  • 2020-11-21 11:50

    for python2, you can use this trick

    def fun(a,b):
        return a+b
    
    pool = multiprocessing.Pool(processes=6)
    b=233
    pool.map(lambda x:fun(x,b),range(1000))
    
    0 讨论(0)
提交回复
热议问题