问题
I have two codes. One is pooled (multiprocessing) version of the other. However, the parallel version with even 1 processor is taking a long time whereas the serial version finishes in ~15 sec. Can someone help to accelerate the second version.
- Serial
import numpy as np, time
def mapTo(d):
global tree
for idx, item in enumerate(list(d), start=1):
tree[str(item)].append(idx)
data=np.random.randint(1,4, 20000000)
tree = dict({"1":[],"2":[],"3":[]})
s= time.perf_counter()
mapTo(data)
e = time.perf_counter()
print("elapsed time:",e-s)
takes: ~15 sec
- Parallel
from multiprocessing import Manager, Pool
from functools import partial
import numpy as np
import time
def mapTo(i_d,tree):
idx,item = i_d
l = tree[str(item)]
l.append(idx)
tree[str(item)] = l
manager = Manager()
data = np.random.randint(1,4, 20000000)
# sharedtree= manager.dict({"1":manager.list(),"2":manager.list(),"3":manager.list()})
sharedtree = manager.dict({"1":[],"2":[],"3":[]})
s= time.perf_counter()
with Pool(processes=1) as pool:
pool.map(partial(mapTo, tree=sharedtree), list(enumerate(data,start=1)))
e = time.perf_counter()
print("elapsed time:",e-s)
来源:https://stackoverflow.com/questions/61593959/efficient-implmentation-of-python-multiprocesssing-pool