问题
I am using Python's multiprocessing
to create a parallel application. Processes need to share some data, for which I use a Manager
. However, I have some common functions which processes need to call and which need to access the data stored by the Manager
object. My question is whether I can avoid needing to pass the Manager
instance to these common functions as an argument and rather use it like a global. In other words, consider the following code:
import multiprocessing as mp
manager = mp.Manager()
global_dict = manager.dict(a=[0])
def add():
global_dict['a'] += [global_dict['a'][-1]+1]
def foo_parallel(var):
add()
print var
num_processes = 5
p = []
for i in range(num_processes):
p.append(mp.Process(target=foo_parallel,args=(global_dict,)))
[pi.start() for pi in p]
[pi.join() for pi in p]
This runs fine and returns p=[0,1,2,3,4,5]
on my machine. However, is this "good form"? Is this a good way to doing it, just as good as defining add(var)
and calling add(var)
instead?
回答1:
Your code example seems to have bigger problems than form. You get your desired output only with luck. Repeated execution will yield different results. That's because +=
is not an atomic operation. Multiple processes can read the same old value one after another, before any of them has updated it and they will write back the same values. To prevent this behaviour, you'll have to use a Manager.Lock
additionally.
To your original question about "good form".
IMO it would be cleaner, to let the main-function of the child process foo_parallel
, pass global_dict
explicitly into a generic function add(var)
. That would be a form of dependency injection and has some advantages. In your example non-exhaustively:
- allows isolated testing
- increases code reusability
easier debugging (detecting non-accessibility of the managed object shouldn't be delayed until
add
is called (fail fast)less boilerplate code (for example try-excepts blocks on resources multiple functions need)
As a side note. Using list comprehensions only for it's side effects is considered a 'code smell'. If you don't need a list as result, just use a for-loop.
Code:
import os
from multiprocessing import Process, Manager
def add(l):
l += [l[-1] + 1]
return l
def foo_parallel(global_dict, lock):
with lock:
l = global_dict['a']
global_dict['a'] = add(l)
print(os.getpid(), global_dict)
if __name__ == '__main__':
N_WORKERS = 5
with Manager() as manager:
lock = manager.Lock()
global_dict = manager.dict(a=[0])
pool = [Process(target=foo_parallel, args=(global_dict, lock))
for _ in range(N_WORKERS)]
for p in pool:
p.start()
for p in pool:
p.join()
print('result', global_dict)
来源:https://stackoverflow.com/questions/52435589/python-multiprocessing-and-manager