Do multiprocessing pools give every process the same number of tasks, or are they assigned as available?

后端 未结 3 1931
梦毁少年i
梦毁少年i 2020-12-24 14:15

When you map an iterable to a multiprocessing.Pool are the iterations divided into a queue for each process in the pool at the start, or is there a

3条回答
  •  时光说笑
    2020-12-24 14:34

    To estimate chunksize used by a Python implementation without looking at its multiprocessing module source code, run:

    #!/usr/bin/env python
    import multiprocessing as mp
    from itertools import groupby
    
    def work(index):
        mp.get_logger().info(index)
        return index, mp.current_process().name
    
    if __name__ == "__main__":
        import logging
        import sys
        logger = mp.log_to_stderr()
    
        # process cmdline args
        try:
            sys.argv.remove('--verbose')
        except ValueError:
            pass  # not verbose
        else:
            logger.setLevel(logging.INFO)  # verbose
        nprocesses, nitems = int(sys.argv.pop(1)), int(sys.argv.pop(1))
        # choices: 'map', 'imap', 'imap_unordered'
        map_name = sys.argv[1] if len(sys.argv) > 1 else 'map'
        kwargs = dict(chunksize=int(sys.argv[2])) if len(sys.argv) > 2 else {}
    
        # estimate chunksize used
        max_chunksize = 0
        map_func = getattr(mp.Pool(nprocesses), map_name)
        for _, group in groupby(sorted(map_func(work, range(nitems), **kwargs),
                                       key=lambda x: x[0]),  # sort by index
                                key=lambda x: x[1]):  # group by process name
            max_chunksize = max(max_chunksize, len(list(group)))
        print("%s: max_chunksize %d" % (map_name, max_chunksize))
    

    It shows that imap, imap_unordered use chunksize=1 by default and max_chunksize for map depends on nprocesses, nitem (number of chunks per process is not fixed) and max_chunksize depends on python version. All *map* functions take into account chunksize parameter if it is specified.

    Usage

    $ ./estimate_chunksize.py nprocesses nitems [map_name [chunksize]] [--verbose]
    

    To see how individual jobs are distributed; specify --verbose parameter.

提交回复
热议问题