问题
I have this very simple python code that I want to speed up by parallelizing it. However no matter what I seem to do, multiprocessing.Pool.map
doesn't gain anything over the standard map.
I've read other threads where people use this with very small functions that don't parallelize well and lead to excessive overhead but I would think that shouldn't be the case here.
Am I doing something wrong?
Here's the example
#!/usr/bin/python
import numpy, time
def AddNoise(sample):
#time.sleep(0.001)
return sample + numpy.random.randint(0,9,sample.shape)
#return sample + numpy.ones(sample.shape)
n=100
m=10000
start = time.time()
A = list([ numpy.random.randint(0,9,(n,n)) for i in range(m) ])
print("creating %d numpy arrays of %d x %d took %.2f seconds"%(m,n,n,time.time()-start))
for i in range(3):
start = time.time()
A = list(map(AddNoise, A))
print("adding numpy arrays took %.2f seconds"%(time.time()-start))
for i in range(3):
import multiprocessing
start = time.time()
with multiprocessing.Pool(processes=2) as pool:
A = list(pool.map(AddNoise, A, chunksize=100))
print("adding numpy arrays with multiprocessing Pool took %.2f seconds"%(time.time()-start))
for i in range(3):
import concurrent.futures
start = time.time()
with concurrent.futures.ProcessPoolExecutor(max_workers=2) as executor:
A = list(executor.map(AddNoise, A))
print("adding numpy arrays with concurrent.futures.ProcessPoolExecutor took %.2f seconds"%(time.time()-start))
Which results in the following output on my 4-core/8-thread laptop, which is idle otherwise
$ python test-pool.py
creating 10000 numpy arrays of 100 x 100 took 1.54 seconds
adding numpy arrays took 1.65 seconds
adding numpy arrays took 1.51 seconds
adding numpy arrays took 1.51 seconds
adding numpy arrays with multiprocessing Pool took 1.99 seconds
adding numpy arrays with multiprocessing Pool took 1.98 seconds
adding numpy arrays with multiprocessing Pool took 1.94 seconds
adding numpy arrays with concurrent.futures.ProcessPoolExecutor took 3.32 seconds
adding numpy arrays with concurrent.futures.ProcessPoolExecutor took 3.17 seconds
adding numpy arrays with concurrent.futures.ProcessPoolExecutor took 3.25 seconds
回答1:
The problem is in the result transfer. Consider that with multiprocessing the arrays you create inside the child processes need to be transferred back to the main process.. and this is an overhead.
I checked this modifying the AddNoise function in this way, which preserve the computation time, but discard the transfer time:
def AddNoise(sample):
sample + numpy.random.randint(0,9,sample.shape)
return None
来源:https://stackoverflow.com/questions/48489753/why-doesnt-multiprocessing-pool-map-speed-up-compared-to-serial-map