I have a code structure that looks like this:
Class A:
def __init__(self):
processes = []
for i in range(1000):
p = Process(target=self.RunProces
There are a couple of syntax issues that I can see in your code:
args
in Process
expects a tuple, you pass an integer, please change line 5 to:
p = Process(target=self.RunProcess, args=(i,))
list.append
is a method and arguments passed to it should be enclosed in ()
, not []
, please change line 6 to:
processes.append(p)
As @qarma points out, its not good practice to start the processes in the class constructor. I would structure the code as follows (adapting your example):
import multiprocessing as mp
from time import sleep
class A(object):
def __init__(self, *args, **kwargs):
# do other stuff
pass
def do_something(self, i):
sleep(0.2)
print('%s * %s = %s' % (i, i, i*i))
def run(self):
processes = []
for i in range(1000):
p = mp.Process(target=self.do_something, args=(i,))
processes.append(p)
[x.start() for x in processes]
if __name__ == '__main__':
a = A()
a.run()
A practical work-around is to break down your class, e.g. like this:
class A:
def __init__(self, ...):
pass
def compute(self):
procs = [Process(self.run, ...) for ... in ...]
[p.start() for p in procs]
[p.join() for p in procs]
def run(self, ...):
pass
pool = A(...)
pool.compute()
When you fork a process inside __init__
, the class instance self
may not be fully initialised anyway, thus it's odd to ask a subprocess to execute self.run
, although technically, yes, it's possible.
If it's not that, then it sounds like an instance of this issue:
http://bugs.python.org/issue11240
It should simplify things for you to use a Pool
. As far as speed, starting up the processes does take time. However, using a Pool
as opposed to running njobs
of Process
should be as fast as you can get it to run with processes. The default setting for a Pool
(as used below) is to use the maximum number of processes available (i.e. the number of CPUs you have), and keep farming out new jobs to a worker as soon as a job completes. You won't get njobs
-way parallel, but you'll get as much parallelism that your CPUs can handle without oversubscribing your processors. I'm using pathos
, which has a fork of multiprocessing
because it's a bit more robust than standard multiprocessing
… and, well, I'm also the author. But you could probably use multiprocessing
for this.
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> class A(object):
... def __init__(self, njobs=1000):
... self.map = Pool().map
... self.njobs = njobs
... self.start()
... def start(self):
... self.result = self.map(self.RunProcess, range(self.njobs))
... return self.result
... def RunProcess(self, i):
... return i*i
...
>>> myA = A()
>>> myA.result[:11]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
>>> myA.njobs = 3
>>> myA.start()
[0, 1, 4]
It's a bit of an odd design to start the Pool
inside of __init__
. But if you want to do that, you have to get results from something like self.result
… and you can use self.start
for subsequent calls.
Get pathos
here: https://github.com/uqfoundation