When you map
an iterable to a multiprocessing.Pool
are the iterations divided into a queue for each process in the pool at the start, or is there a
To estimate chunksize
used by a Python implementation without looking at its multiprocessing
module source code, run:
#!/usr/bin/env python
import multiprocessing as mp
from itertools import groupby
def work(index):
mp.get_logger().info(index)
return index, mp.current_process().name
if __name__ == "__main__":
import logging
import sys
logger = mp.log_to_stderr()
# process cmdline args
try:
sys.argv.remove('--verbose')
except ValueError:
pass # not verbose
else:
logger.setLevel(logging.INFO) # verbose
nprocesses, nitems = int(sys.argv.pop(1)), int(sys.argv.pop(1))
# choices: 'map', 'imap', 'imap_unordered'
map_name = sys.argv[1] if len(sys.argv) > 1 else 'map'
kwargs = dict(chunksize=int(sys.argv[2])) if len(sys.argv) > 2 else {}
# estimate chunksize used
max_chunksize = 0
map_func = getattr(mp.Pool(nprocesses), map_name)
for _, group in groupby(sorted(map_func(work, range(nitems), **kwargs),
key=lambda x: x[0]), # sort by index
key=lambda x: x[1]): # group by process name
max_chunksize = max(max_chunksize, len(list(group)))
print("%s: max_chunksize %d" % (map_name, max_chunksize))
It shows that imap
, imap_unordered
use chunksize=1
by default and max_chunksize
for map
depends on nprocesses
, nitem
(number of chunks per process is not fixed) and max_chunksize
depends on python version. All *map*
functions take into account chunksize
parameter if it is specified.
$ ./estimate_chunksize.py nprocesses nitems [map_name [chunksize]] [--verbose]
To see how individual jobs are distributed; specify --verbose
parameter.