问题
Just experimenting and learning, and I know how to create a shared dictionary that can be accessed with multiple proceses but I'm not sure how to keep the dict synced. defaultdict
, I believe, illustrates the problem I'm having.
from collections import defaultdict
from multiprocessing import Pool, Manager, Process
#test without multiprocessing
s = 'mississippi'
d = defaultdict(int)
for k in s:
d[k] += 1
print d.items() # Success! result: [('i', 4), ('p', 2), ('s', 4), ('m', 1)]
print '*'*10, ' with multiprocessing ', '*'*10
def test(k, multi_dict):
multi_dict[k] += 1
if __name__ == '__main__':
pool = Pool(processes=4)
mgr = Manager()
multi_d = mgr.dict()
for k in s:
pool.apply_async(test, (k, multi_d))
# Mark pool as closed -- no more tasks can be added.
pool.close()
# Wait for tasks to exit
pool.join()
# Output results
print multi_d.items() #FAIL
print '*'*10, ' with multiprocessing and process module like on python site example', '*'*10
def test2(k, multi_dict2):
multi_dict2[k] += 1
if __name__ == '__main__':
manager = Manager()
multi_d2 = manager.dict()
for k in s:
p = Process(target=test2, args=(k, multi_d2))
p.start()
p.join()
print multi_d2 #FAIL
The first result works(because its not using multiprocessing
), but I'm having problems getting it to work with multiprocessing
. I'm not sure how to solve it but I think there might be due to it not being synced(and joining the results later) or maybe because within multiprocessing
I cannot figure how to set defaultdict(int)
to the dictionary.
Any help or suggestions on how to get this to work would be great!
回答1:
You can subclass BaseManager
and register additional types for sharing. You need to provide a suitable proxy type in cases where the default AutoProxy
-generated type does not work. For defaultdict
, if you only need to access the attributes that are already present in dict
, you can use DictProxy
.
from multiprocessing import Pool
from multiprocessing.managers import BaseManager, DictProxy
from collections import defaultdict
class MyManager(BaseManager):
pass
MyManager.register('defaultdict', defaultdict, DictProxy)
def test(k, multi_dict):
multi_dict[k] += 1
if __name__ == '__main__':
pool = Pool(processes=4)
mgr = MyManager()
mgr.start()
multi_d = mgr.defaultdict(int)
for k in 'mississippi':
pool.apply_async(test, (k, multi_d))
pool.close()
pool.join()
print multi_d.items()
回答2:
Well, the Manager
class seems to supply only a fixed number of predefined data structures which can be shared among processes, and defaultdict
is not among them. If you really just need that one defaultdict
, the easiest solution would be to implement the defaulting behavior on your own:
def test(k, multi_dict):
if k not in multi_dict:
multi_dict[k] = 0
multi_dict[k] += 1
来源:https://stackoverflow.com/questions/9256687/using-defaultdict-with-multiprocessing