multiprocessing ignores “__setstate__”

问题

I assumed that the multiprocessing package used pickle to send things between processes. However, pickle pays attention to the __getstate__ and __setstate__ methods of an object. Multiprocessing seems to ignore them. Is this correct? Am I confused?

To replicate, install docker, and type into command line

$ docker run python:3.4 python -c "import pickle
import multiprocessing
import os

class Tricky:
    def __init__(self,x):
        self.data=x

    def __setstate__(self,d):
        self.data=10

    def __getstate__(self):
        return {}

def report(ar,q):
    print('running report in pid %d, hailing from %d'%(os.getpid(),os.getppid()))
    q.put(ar.data)

print('module loaded in pid %d, hailing from pid %d'%(os.getpid(),os.getppid()))
if __name__ == '__main__':
    print('hello from pid %d'%os.getpid())
    ar = Tricky(5)
    q = multiprocessing.Queue()
    p = multiprocessing.Process(target=report, args=(ar, q))
    p.start()
    p.join()
    print(q.get())
    print(pickle.loads(pickle.dumps(ar)).data)"

You should get something like

module loaded in pid 1, hailing from pid 0
hello from pid 1
running report in pid 5, hailing from 1
5
10

I would have thought it would have been "10" "10" but instead it is "5" "10". What could it mean?

(note: code edited to comply with programming guidelines, as suggested by user3667217)

回答1:

Reminder: when you're using multiprocessing, you need to start a process in an 'if __name__ == '__main__': clause: (see programming guidelines)

import pickle
import multiprocessing

class Tricky:
    def __init__(self,x):
        self.data=x

    def __setstate__(self, d):
        print('setstate happening')
        self.data = 10

    def __getstate__(self):
        return self.data
        print('getstate happening')

def report(ar,q):
    q.put(ar.data)

if __name__ == '__main__':
    ar = Tricky(5)
    q = multiprocessing.Queue()
    p = multiprocessing.Process(target=report, args=(ar, q))
    print('now starting process')
    p.start()
    print('now joining process')
    p.join()
    print('now getting results from queue')
    print(q.get())
    print('now getting pickle dumps')
    print(pickle.loads(pickle.dumps(ar)).data)

On windows, I see

now starting process
now joining process
setstate happening
now getting results from queue 
10
now getting pickle dumps
setstate happening
10

On Ubuntu, I see:

now starting process
now joining process
now getting results from queue
5
now getting pickle dumps
getstate happening
setstate happening
10

I suppose this should answer your question. The multiprocess invokes __setstate__ method on Windows but not on Linux. And on Linux, when you call pickle.dumps it first call __getstate__, then __setstate__. It's interesting to see how multiprocessing module is behaving differently on different platforms.

回答2:

The multiprocessing module can start one of three ways: spawn, fork, or forkserver. By default on unix, it forks. That means that there's no need to pickle anything that's already loaded into ram at the moment the new process is born.

If you need more direct control over how you want the fork to take place, you need to change the startup setting to spawn. To do this, create a context

ctx=multiprocessing.get_context('spawn')

and replace all calls to multiprocessing.foo() with calls to ctx.foo(). When you do this, every new process is born as a fresh python instance; everything that gets sent into it will be sent via pickle, instead of direct memcopy.

来源：https://stackoverflow.com/questions/33685329/multiprocessing-ignores-setstate

标签

python

multiprocessing

pickle

python-multiprocessing