Why doesn't multiprocessing pool map speed up compared to serial map?

问题

I have this very simple python code that I want to speed up by parallelizing it. However no matter what I seem to do, multiprocessing.Pool.map doesn't gain anything over the standard map.

I've read other threads where people use this with very small functions that don't parallelize well and lead to excessive overhead but I would think that shouldn't be the case here.

Am I doing something wrong?

Here's the example

#!/usr/bin/python

import numpy, time

def AddNoise(sample):
    #time.sleep(0.001)
    return sample + numpy.random.randint(0,9,sample.shape)
    #return sample + numpy.ones(sample.shape)

n=100
m=10000
start = time.time()
A = list([ numpy.random.randint(0,9,(n,n)) for i in range(m) ])
print("creating %d numpy arrays of %d x %d took %.2f seconds"%(m,n,n,time.time()-start))

for i in range(3):
    start = time.time()
    A = list(map(AddNoise, A))
    print("adding numpy arrays took %.2f seconds"%(time.time()-start))

for i in range(3):
    import multiprocessing
    start = time.time()
    with multiprocessing.Pool(processes=2) as pool:
        A = list(pool.map(AddNoise, A, chunksize=100))
    print("adding numpy arrays with multiprocessing Pool took %.2f seconds"%(time.time()-start))

for i in range(3):
    import concurrent.futures
    start = time.time()
    with concurrent.futures.ProcessPoolExecutor(max_workers=2) as executor:
        A = list(executor.map(AddNoise, A))
    print("adding numpy arrays with concurrent.futures.ProcessPoolExecutor took %.2f seconds"%(time.time()-start))

Which results in the following output on my 4-core/8-thread laptop, which is idle otherwise

$ python test-pool.py 
creating 10000 numpy arrays of 100 x 100 took 1.54 seconds
adding numpy arrays took 1.65 seconds
adding numpy arrays took 1.51 seconds
adding numpy arrays took 1.51 seconds
adding numpy arrays with multiprocessing Pool took 1.99 seconds
adding numpy arrays with multiprocessing Pool took 1.98 seconds
adding numpy arrays with multiprocessing Pool took 1.94 seconds
adding numpy arrays with concurrent.futures.ProcessPoolExecutor took 3.32 seconds
adding numpy arrays with concurrent.futures.ProcessPoolExecutor took 3.17 seconds
adding numpy arrays with concurrent.futures.ProcessPoolExecutor took 3.25 seconds

回答1:

The problem is in the result transfer. Consider that with multiprocessing the arrays you create inside the child processes need to be transferred back to the main process.. and this is an overhead.

I checked this modifying the AddNoise function in this way, which preserve the computation time, but discard the transfer time:

def AddNoise(sample):
   sample + numpy.random.randint(0,9,sample.shape)
   return None

来源：https://stackoverflow.com/questions/48489753/why-doesnt-multiprocessing-pool-map-speed-up-compared-to-serial-map

标签

python

python-3.x

numpy

multiprocessing

pool