Why is multiprocessing.Pool.map slower than builtin map?

问题

import multiprocessing
import time
from subprocess import call,STDOUT
from glob import glob
import sys


def do_calculation(data):
    x = time.time()
    with open(data + '.classes.report','w') as f:
        call(["external script", data], stdout = f.fileno(), stderr=STDOUT)
    return 'apk: {data!s} time {tim!s}'.format(data = data ,tim = time.time()-x)


def start_process():
    print 'Starting', multiprocessing.current_process().name

if __name__ == '__main__':

    inputs = glob('./*.dex')


    builtin_outputs = map(do_calculation, inputs)
    print 'Built-in:'
    for i in builtin_outputs:
        print i

    pool_size = multiprocessing.cpu_count() * 2
    print 'Worker Pool size: %s' % pool_size
    pool = multiprocessing.Pool(processes=pool_size,
                                initializer=start_process,
                                )
    pool_outputs = pool.map(do_calculation, inputs)
    pool.close() # no more tasks
    pool.join()  # wrap up current tasks

    print 'Pool output:'
    for i in pool_outputs:
        print i

Surprisingly, builtin_outputs has a faster execution time than pool_outputs:

Built-in:
apk: ./TooDo_2.0.8.classes.dex time 5.69289898872
apk: ./TooDo_2.0.9.classes.dex time 5.37206411362
apk: ./Twitter_Client.classes.dex time 0.272782087326
apk: ./zaTelnet_Light.classes.dex time 0.141801118851
apk: ./Temperature_Converter.classes.dex time 0.270312070847
apk: ./Tipper_1.0.classes.dex time 0.293262958527
apk: ./XLive.classes.dex time 0.361288070679
apk: ./TwitterDroid_0.1.2_alpha.classes.dex time 0.381947040558
apk: ./Universal_Conversion_Application.classes.dex time 0.404763936996

Worker Pool size: 8

Pool output:
apk: ./TooDo_2.0.8.classes.dex time 5.72440505028
apk: ./TooDo_2.0.9.classes.dex time 5.9017829895
apk: ./Twitter_Client.classes.dex time 0.309305906296
apk: ./zaTelnet_Light.classes.dex time 0.374011039734
apk: ./Temperature_Converter.classes.dex time 0.450366973877
apk: ./Tipper_1.0.classes.dex time 0.379780054092
apk: ./XLive.classes.dex time 0.394504070282
apk: ./TwitterDroid_0.1.2_alpha.classes.dex time 0.505702018738
apk: ./Universal_Conversion_Application.classes.dex time 0.512043952942

How can this performance difference be explained?

回答1:

If the workload involved in "external script" is sufficiently IO-heavy that it saturates your hard disk, running multiple copies in parallel will only slow you down, as reading from multiple files incurs additional seeks.

Same goes if you're saturating your CPU and you don't have multiple CPU cores available.

回答2:

When you use multiprocessing, it behooves you to give the worker processes enough computation to last for at least a few seconds. If the worker process ends too quickly, then too much time is spent setting up the pool, spawning the subprocess, and (potentially) switching between processes (and not enough time actually doing the intended computation) to justify using multiprocessing.

Also, if you have a CPU-bound computation, then initializing a pool with more processes than cores (multiprocessing.cpu_count()) is counter-productive. It will make the OS switch between processes while not allowing the computation to proceed any faster.

回答3:

def do_calculation(data):
    x = time.time()
    with open(data + '.classes.report','w') as f:
        call(["external script", data], stdout = f.fileno(), stderr=STDOUT)
    return 'apk: {data!s} time {tim!s}'.format(data = data ,tim = time.time()-x)

You are measuring the time required to perform a single task. If you run your tasks in parallel, each individual task doesn't get shorter. Rather, they all run at the same time. In other words, you are measuring this wrong, you should be calculating the total time for all tasks not each task individually.

The slowness is probably because running the various tasks at the same time interferes with each other somewhat and so the tasks don't run at full speed.

来源：https://stackoverflow.com/questions/9169538/why-is-multiprocessing-pool-map-slower-than-builtin-map

标签

python

multiprocessing

performance-testing