Python how to do multiprocessing inside of a class?

后端 未结 3 690
余生分开走
余生分开走 2021-02-01 03:43

I have a code structure that looks like this:

Class A:
  def __init__(self):
    processes = []
    for i in range(1000):
      p = Process(target=self.RunProces         


        
相关标签:
3条回答
  • 2021-02-01 04:21

    There are a couple of syntax issues that I can see in your code:

    • args in Process expects a tuple, you pass an integer, please change line 5 to:

      p = Process(target=self.RunProcess, args=(i,))

    • list.append is a method and arguments passed to it should be enclosed in (), not [], please change line 6 to:

      processes.append(p)

    As @qarma points out, its not good practice to start the processes in the class constructor. I would structure the code as follows (adapting your example):

    import multiprocessing as mp
    from time import sleep
    
    class A(object):
        def __init__(self, *args, **kwargs):
            # do other stuff
            pass
    
        def do_something(self, i):
            sleep(0.2)
            print('%s * %s = %s' % (i, i, i*i))
    
        def run(self):
            processes = []
    
            for i in range(1000):
                p = mp.Process(target=self.do_something, args=(i,))
                processes.append(p)
    
            [x.start() for x in processes]
    
    
    if __name__ == '__main__':
        a = A()
        a.run()
    
    0 讨论(0)
  • 2021-02-01 04:23

    A practical work-around is to break down your class, e.g. like this:

    class A:
        def __init__(self, ...):
            pass
    
        def compute(self):
            procs = [Process(self.run, ...) for ... in ...]
            [p.start() for p in procs]
            [p.join() for p in procs]
    
        def run(self, ...):
            pass
    
    pool = A(...)
    pool.compute()
    

    When you fork a process inside __init__, the class instance self may not be fully initialised anyway, thus it's odd to ask a subprocess to execute self.run, although technically, yes, it's possible.

    If it's not that, then it sounds like an instance of this issue:

    http://bugs.python.org/issue11240

    0 讨论(0)
  • 2021-02-01 04:29

    It should simplify things for you to use a Pool. As far as speed, starting up the processes does take time. However, using a Pool as opposed to running njobs of Process should be as fast as you can get it to run with processes. The default setting for a Pool (as used below) is to use the maximum number of processes available (i.e. the number of CPUs you have), and keep farming out new jobs to a worker as soon as a job completes. You won't get njobs-way parallel, but you'll get as much parallelism that your CPUs can handle without oversubscribing your processors. I'm using pathos, which has a fork of multiprocessing because it's a bit more robust than standard multiprocessing… and, well, I'm also the author. But you could probably use multiprocessing for this.

    >>> from pathos.multiprocessing import ProcessingPool as Pool
    >>> class A(object):
    ...   def __init__(self, njobs=1000):
    ...     self.map = Pool().map
    ...     self.njobs = njobs
    ...     self.start()
    ...   def start(self):
    ...     self.result = self.map(self.RunProcess, range(self.njobs))
    ...     return self.result
    ...   def RunProcess(self, i):
    ...     return i*i
    ... 
    >>> myA = A()
    >>> myA.result[:11]
    [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
    >>> myA.njobs = 3
    >>> myA.start()  
    [0, 1, 4]
    

    It's a bit of an odd design to start the Pool inside of __init__. But if you want to do that, you have to get results from something like self.result… and you can use self.start for subsequent calls.

    Get pathos here: https://github.com/uqfoundation

    0 讨论(0)
提交回复
热议问题