multiprocessing pool.map call functions in certain order

后端 未结 2 956
感情败类
感情败类 2020-12-30 01:15

How can I make multiprocessing.pool.map distribute processes in numerical order?


More Info:
I have a program which processes a few thousand data files, mak

相关标签:
2条回答
  • 2020-12-30 01:51

    What about changing map to imap:

    import os
    from multiprocessing import Pool
    import time
    
    num_proc = 4
    num_calls = 20
    sleeper = 0.1
    
    def SomeFunc(arg):
        time.sleep(sleeper)
        print "%s %5d" % (os.getpid(), arg)
        return arg
    
    proc_pool = Pool(num_proc)
    list(proc_pool.imap(SomeFunc, range(num_calls)))
    

    The reason maybe that the default chunksize of imap is 1, so it may not run as far as map.

    0 讨论(0)
  • 2020-12-30 01:56

    The reason that this occurs is because each process is given a predefined amount of work to do at the start of the call to map which is dependant on the chunksize. We can work out the default chunksize by looking at the source for pool.map

    chunksize, extra = divmod(len(iterable), len(self._pool) * 4)
    if extra:
      chunksize += 1
    

    So for a range of 20, and with 4 processes, we will get a chunksize of 2.

    If we modify your code to reflect this we should get similar results to the results you are getting now:

    proc_pool.map(SomeFunc, range(num_calls), chunksize=2)

    This yields the output:

    0 2 6 4 1 7 5 3 8 10 12 14 9 13 15 11 16 18 17 19

    Now, setting the chunksize=1 will ensure that each process within the pool will only be given one task at a time.

    proc_pool.map(SomeFunc, range(num_calls), chunksize=1)

    This should ensure a reasonably good numerical ordering compared to that when not specifying a chunksize. For example a chunksize of 1 yields the output:

    0 1 2 3 4 5 6 7 9 10 8 11 13 12 15 14 16 17 19 18

    0 讨论(0)
提交回复
热议问题