How can I make multiprocessing.pool.map distribute processes in numerical order?
More Info:
I have a program which processes a few thousand data files, mak
What about changing map
to imap
:
import os
from multiprocessing import Pool
import time
num_proc = 4
num_calls = 20
sleeper = 0.1
def SomeFunc(arg):
time.sleep(sleeper)
print "%s %5d" % (os.getpid(), arg)
return arg
proc_pool = Pool(num_proc)
list(proc_pool.imap(SomeFunc, range(num_calls)))
The reason maybe that the default chunksize
of imap
is 1, so it may not run as far as map
.
The reason that this occurs is because each process is given a predefined amount of work to do at the start of the call to map which is dependant on the chunksize
. We can work out the default chunksize
by looking at the source for pool.map
chunksize, extra = divmod(len(iterable), len(self._pool) * 4)
if extra:
chunksize += 1
So for a range of 20, and with 4 processes, we will get a chunksize
of 2.
If we modify your code to reflect this we should get similar results to the results you are getting now:
proc_pool.map(SomeFunc, range(num_calls), chunksize=2)
This yields the output:
0 2 6 4 1 7 5 3 8 10 12 14 9 13 15 11 16 18 17 19
Now, setting the chunksize=1
will ensure that each process within the pool will only be given one task at a time.
proc_pool.map(SomeFunc, range(num_calls), chunksize=1)
This should ensure a reasonably good numerical ordering compared to that when not specifying a chunksize. For example a chunksize of 1 yields the output:
0 1 2 3 4 5 6 7 9 10 8 11 13 12 15 14 16 17 19 18