Python concurrent.futures: ProcessPoolExecutor fail to work

前端 未结 1 537
[愿得一人]
[愿得一人] 2021-01-22 03:31

I\'m trying to use the ProcessPoolExecutor method but it fails. Here is an example(calculation of the big divider of two numbers) of a failed use. I don\'t understand what the m

相关标签:
1条回答
  • 2021-01-22 04:11

    Change your code to look like this, and it will work:

    from time import time
    from concurrent.futures import ProcessPoolExecutor
    def gcd(pair):
        a, b = pair
        low = min(a, b)
        for i in range(low, 0, -1):
            if a % i == 0 and b % i == 0:
                return i
    
    numbers = [(1963309, 2265973), (2030677, 3814172),
               (1551645, 2229620), (2039045, 2020802)]
    
    def main():
        start = time()
        pool = ProcessPoolExecutor(max_workers=3)
        results = list(pool.map(gcd, numbers))
        end = time()
        print('Took %.3f seconds' % (end - start))
    
    
    if __name__ == '__main__':
        main()
    

    On systems that support fork() this isn't necessary, because your script is imported just once, and then each process started ProcessPoolExecutor will already have a copy of objects in your global namespace like the gcd function. Once they're forked, they go through a bootstrap process whereby they start running their target function (in this case a worker process loop that accepts jobs from the process pool executor) and they never return to the original code in your main module from which they were forked.

    By contrast, if you're using the spawn-based processes which are the default on Windows and OSX, a new process must be started up from scratch for each worker process, and if they must re-import your module. However, if your module does something like ProcessPoolExecutor directly in the module body, without guarding it like if __name__ == '__main__':, then there is no way for them to import your module without also starting a new ProcessPoolExecutor. So this error you're getting is essentially guarding you from creating an infinite process bomb.

    This is mentioned in the docs for ProcessPoolExecutor:

    The __main__ module must be importable by worker subprocesses. This means that ProcessPoolExecutor will not work in the interactive interpreter.

    But they don't really make clear why that is or what it means for the __main__ module to be "importable". When you write a simple script in Python and run it like python foo.py, your script foo.py is loaded with a module name of __main__, as opposed to a module named foo which you get if you import foo. For it to be "importable" in this case really means, importable without major side-effects like spawning new processes.

    0 讨论(0)
提交回复
热议问题