Python multiprocessing with start method 'spawn' doesn't work

问题

I wrote a Python class to plot pylots in parallel. It works fine on Linux where the default start method is fork but when I tried it on Windows I ran into problems (which can be reproduced on Linux using the spawn start method - see code below). I always end up getting this error:

Traceback (most recent call last):
  File "test.py", line 50, in <module>
    test()
  File "test.py", line 7, in test
    asyncPlotter.saveLinePlotVec3("test")
  File "test.py", line 41, in saveLinePlotVec3
    args=(test, ))
  File "test.py", line 34, in process
    p.start()
  File "C:\Users\adrian\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
  File "C:\Users\adrian\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\adrian\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Users\adrian\AppData\Local\Programs\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\adrian\AppData\Local\Programs\Python\Python37\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle weakref objects

C:\Python\MonteCarloTools>Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\adrian\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 99, in spawn_main
    new_handle = reduction.steal_handle(parent_pid, pipe_handle)
  File "C:\Users\adrian\AppData\Local\Programs\Python\Python37\lib\multiprocessing\reduction.py", line 82, in steal_handle
    _winapi.PROCESS_DUP_HANDLE, False, source_pid)
OSError: [WinError 87] The parameter is incorrect

I hope there is a way to make this code work for Windows. Here a link to the different start methods available on Linux and Windows: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

import multiprocessing as mp
def test():

    manager = mp.Manager()
    asyncPlotter = AsyncPlotter(manager.Value('i', 0))

    asyncPlotter.saveLinePlotVec3("test")
    asyncPlotter.saveLinePlotVec3("test")

    asyncPlotter.join()


class AsyncPlotter():

    def __init__(self, nc, processes=mp.cpu_count()):

        self.nc = nc
        self.pids = []
        self.processes = processes


    def linePlotVec3(self, nc, processes, test):

        self.waitOnPool(nc, processes)

        print(test)

        nc.value -= 1


    def waitOnPool(self, nc, processes):

        while nc.value >= processes:
            time.sleep(0.1)
        nc.value += 1


    def process(self, target, args):

        ctx = mp.get_context('spawn') 
        p = ctx.Process(target=target, args=args)
        p.start()
        self.pids.append(p)


    def saveLinePlotVec3(self, test):

        self.process(target=self.linePlotVec3,
                       args=(self.nc, self.processes, test))


    def join(self):
        for p in self.pids:
            p.join()


if __name__=='__main__':
    test()

回答1:

When using the spawn start method, the Process object itself is being pickled for use in the child process. In your code, the target=target argument is a bound method of AsyncPlotter. It looks like the entire asyncPlotter instance must also be pickled for that to work, and that includes self.manager, which apparently doesn't want to be pickled.

In short, keep Manager outside of AsyncPlotter. This works on my macOS system:

def test():
    manager = mp.Manager()
    asyncPlotter = AsyncPlotter(manager.Value('i', 0))
    ...

Also, as noted in your comment, asyncPlotter did not work when reused. I don't know the details but looks like it has something to do with how the Value object is shared across processes. The test function would need to be like:

def test():
    manager = mp.Manager()
    nc = manager.Value('i', 0)

    asyncPlotter1 = AsyncPlotter(nc)
    asyncPlotter1.saveLinePlotVec3("test 1")
    asyncPlotter2 = AsyncPlotter(nc)
    asyncPlotter2.saveLinePlotVec3("test 2")

    asyncPlotter1.join()
    asyncPlotter2.join()

All in all, you might want to restructure your code and use a process pool. It already handles what AsyncPlotter is doing with cpu_count and parallel execution:

from multiprocessing import Pool, set_start_method
from random import random
import time

def linePlotVec3(test):
    time.sleep(random())
    print("test", test)

if __name__ == "__main__":
    set_start_method("spawn")
    with Pool() as pool:
        pool.map(linePlotVec3, range(20))

Or you could use a ProcessPoolExecutor to do pretty much the same thing. This example starts tasks one at a time instead of mapping to a list:

from concurrent.futures import ProcessPoolExecutor
import multiprocessing as mp
import time
from random import random

def work(i):
    r = random()
    print("work", i, r)
    time.sleep(r)

def main():
    ctx = mp.get_context("spawn")
    with ProcessPoolExecutor(mp_context=ctx) as pool:
        for i in range(20):
            pool.submit(work, i)

if __name__ == "__main__":
    main()

回答2:

For portability, all objects passed as arguments to a function that will be run in a process must be picklable.

来源：https://stackoverflow.com/questions/57191393/python-multiprocessing-with-start-method-spawn-doesnt-work

标签

python

multiprocessing