Python multiprocessing and Manager

亡梦爱人 提交于 2020-01-02 16:06:58

问题


I am using Python's multiprocessing to create a parallel application. Processes need to share some data, for which I use a Manager. However, I have some common functions which processes need to call and which need to access the data stored by the Manager object. My question is whether I can avoid needing to pass the Manager instance to these common functions as an argument and rather use it like a global. In other words, consider the following code:

import multiprocessing as mp

manager = mp.Manager()
global_dict = manager.dict(a=[0])

def add():
    global_dict['a'] += [global_dict['a'][-1]+1]

def foo_parallel(var):
    add()
    print var

num_processes = 5
p = []
for i in range(num_processes):
    p.append(mp.Process(target=foo_parallel,args=(global_dict,)))

[pi.start() for pi in p]
[pi.join() for pi in p]

This runs fine and returns p=[0,1,2,3,4,5] on my machine. However, is this "good form"? Is this a good way to doing it, just as good as defining add(var) and calling add(var) instead?


回答1:


Your code example seems to have bigger problems than form. You get your desired output only with luck. Repeated execution will yield different results. That's because += is not an atomic operation. Multiple processes can read the same old value one after another, before any of them has updated it and they will write back the same values. To prevent this behaviour, you'll have to use a Manager.Lock additionally.


To your original question about "good form".

IMO it would be cleaner, to let the main-function of the child process foo_parallel, pass global_dict explicitly into a generic function add(var). That would be a form of dependency injection and has some advantages. In your example non-exhaustively:

  • allows isolated testing
  • increases code reusability
  • easier debugging (detecting non-accessibility of the managed object shouldn't be delayed until addis called (fail fast)

  • less boilerplate code (for example try-excepts blocks on resources multiple functions need)

As a side note. Using list comprehensions only for it's side effects is considered a 'code smell'. If you don't need a list as result, just use a for-loop.

Code:

import os
from multiprocessing import Process, Manager


def add(l):
    l += [l[-1] + 1]
    return l


def foo_parallel(global_dict, lock):
    with lock:
        l = global_dict['a']
        global_dict['a'] = add(l)
        print(os.getpid(), global_dict)


if __name__ == '__main__':

    N_WORKERS = 5

    with Manager() as manager:

        lock = manager.Lock()
        global_dict = manager.dict(a=[0])

        pool = [Process(target=foo_parallel, args=(global_dict, lock))
                for _ in range(N_WORKERS)]

        for p in pool:
            p.start()

        for p in pool:
            p.join()

        print('result', global_dict)


来源:https://stackoverflow.com/questions/52435589/python-multiprocessing-and-manager

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!