How to pass a function with more than one argument to python concurrent.futures.ProcessPoolExecutor.map()?

前端 未结 3 641
执笔经年
执笔经年 2021-01-04 02:16

I would like concurrent.futures.ProcessPoolExecutor.map() to call a function consisting of 2 or more arguments. In the example below, I have resorted to using a

相关标签:
3条回答
  • 2021-01-04 02:34

    Regarding your first question, do I understand it correctly that you want to pass an argument whose value is determined only at the time you call map but constant for all instances of the mapped function? If so, I would do the map with a function derived from a "template function" with the second argument (ref in your example) baked into it using functools.partial:

    from functools import partial
    refval = 5
    
    def _findmatch(ref, listnumber):  # arguments swapped
        ...
    
    with cf.ProcessPoolExecutor(max_workers=workers) as executor:
        for n in executor.map(partial(_findmatch, refval), numberlist):
            ...
    

    Re. question 2, first part: I haven't found the exact piece of code that tries to pickle (serialize) the function that should then be executed in parallel, but it sounds natural that that has to happen -- not only the arguments but also the function has to be transferred to the workers somehow, and it likely has to be serialized for this transfer. The fact that partial functions can be pickled while lambdas cannot is mentioned elsewhere, for instance here: https://stackoverflow.com/a/19279016/6356764.

    Re. question 2, second part: if you wanted to call a function with more than one argument in ProcessPoolExecutor.map, you would pass it the function as the first argument, followed by an iterable of first arguments for the function, followed by an iterable of its second arguments etc. In your case:

    for n in executor.map(_findmatch, numberlist, ref):
        ...
    
    0 讨论(0)
  • 2021-01-04 02:37

    To answer your second question first, you are getting an exception because a lambda function like the one you're using is not picklable. Since Python uses the pickle protocol to serialize the data passed between the main process and the ProcessPoolExecutor's worker processes, this is a problem. It's not clear why you are using a lambda at all. The lambda you had takes two arguments, just like the original function. You could use _findmatch directly instead of the lambda and it should work.

    with cf.ProcessPoolExecutor(max_workers=workers) as executor:
        for n in executor.map(_findmatch, numberlist, ref):
            ...
    

    As for the first issue about passing the second, constant argument without creating a giant list, you could solve this in several ways. One approach might be to use itertools.repeat to create an iterable object that repeats the same value forever when iterated on.

    But a better approach would probably be to write an extra function that passes the constant argument for you. (Perhaps this is why you were trying to use a lambda function?) It should work if the function you use is accessible at the module's top-level namespace:

    def _helper(x):
        return _findmatch(x, 5)
    
    with cf.ProcessPoolExecutor(max_workers=workers) as executor:
        for n in executor.map(_helper, numberlist):
            ...
    
    0 讨论(0)
  • 2021-01-04 02:45

    (1) No need to make a list. You can use itertools.repeat to create an iterator that just repeats the some value.

    (2) You need to pass a named function to map because it will be passed to the subprocess for execution. map uses the pickle protocol to send things, lambdas can't be pickled and therefore they can't be part of the map. But its totally unnecessary. All your lambda did was call a 2 parameter function with 2 parameters. Remove it completely.

    The working code is

    import concurrent.futures as cf
    import itertools
    
    nmax = 10
    numberlist = range(nmax)
    workers = 3
    
    def _findmatch(listnumber, ref):    
        print('def _findmatch(listnumber, ref):')
        x=''
        listnumber=str(listnumber)
        ref = str(ref)
        print('listnumber = {0} and ref = {1}'.format(listnumber, ref))
        if ref in listnumber:
            x = listnumber
        print('x = {0}'.format(x))
        return x 
    
    with cf.ProcessPoolExecutor(max_workers=workers) as executor:
        #for n in executor.map(_findmatch, numberlist):
        for n in executor.map(_findmatch, numberlist, itertools.repeat(5)):
            print(type(n))
            print(n)
            #if str(ref[0]) in n:
            #    print('match')
    
    0 讨论(0)
提交回复
热议问题