问题
I assumed that the multiprocessing package used pickle to send things between processes. However, pickle pays attention to the __getstate__
and __setstate__
methods of an object. Multiprocessing seems to ignore them. Is this correct? Am I confused?
To replicate, install docker, and type into command line
$ docker run python:3.4 python -c "import pickle
import multiprocessing
import os
class Tricky:
def __init__(self,x):
self.data=x
def __setstate__(self,d):
self.data=10
def __getstate__(self):
return {}
def report(ar,q):
print('running report in pid %d, hailing from %d'%(os.getpid(),os.getppid()))
q.put(ar.data)
print('module loaded in pid %d, hailing from pid %d'%(os.getpid(),os.getppid()))
if __name__ == '__main__':
print('hello from pid %d'%os.getpid())
ar = Tricky(5)
q = multiprocessing.Queue()
p = multiprocessing.Process(target=report, args=(ar, q))
p.start()
p.join()
print(q.get())
print(pickle.loads(pickle.dumps(ar)).data)"
You should get something like
module loaded in pid 1, hailing from pid 0
hello from pid 1
running report in pid 5, hailing from 1
5
10
I would have thought it would have been "10" "10" but instead it is "5" "10". What could it mean?
(note: code edited to comply with programming guidelines, as suggested by user3667217)
回答1:
Reminder: when you're using multiprocessing, you need to start a process in an 'if __name__ == '__main__':
clause: (see programming guidelines)
import pickle
import multiprocessing
class Tricky:
def __init__(self,x):
self.data=x
def __setstate__(self, d):
print('setstate happening')
self.data = 10
def __getstate__(self):
return self.data
print('getstate happening')
def report(ar,q):
q.put(ar.data)
if __name__ == '__main__':
ar = Tricky(5)
q = multiprocessing.Queue()
p = multiprocessing.Process(target=report, args=(ar, q))
print('now starting process')
p.start()
print('now joining process')
p.join()
print('now getting results from queue')
print(q.get())
print('now getting pickle dumps')
print(pickle.loads(pickle.dumps(ar)).data)
On windows, I see
now starting process
now joining process
setstate happening
now getting results from queue
10
now getting pickle dumps
setstate happening
10
On Ubuntu, I see:
now starting process
now joining process
now getting results from queue
5
now getting pickle dumps
getstate happening
setstate happening
10
I suppose this should answer your question. The multiprocess
invokes __setstate__
method on Windows but not on Linux. And on Linux, when you call pickle.dumps
it first call __getstate__
, then __setstate__
. It's interesting to see how multiprocessing module is behaving differently on different platforms.
回答2:
The multiprocessing module can start one of three ways: spawn, fork, or forkserver. By default on unix, it forks. That means that there's no need to pickle anything that's already loaded into ram at the moment the new process is born.
If you need more direct control over how you want the fork to take place, you need to change the startup setting to spawn. To do this, create a context
ctx=multiprocessing.get_context('spawn')
and replace all calls to multiprocessing.foo()
with calls to ctx.foo()
. When you do this, every new process is born as a fresh python instance; everything that gets sent into it will be sent via pickle, instead of direct memcopy.
来源:https://stackoverflow.com/questions/33685329/multiprocessing-ignores-setstate