I have a code structure that looks like this:
Class A:
def __init__(self):
processes = []
for i in range(1000):
p = Process(target=self.RunProces
It should simplify things for you to use a Pool
. As far as speed, starting up the processes does take time. However, using a Pool
as opposed to running njobs
of Process
should be as fast as you can get it to run with processes. The default setting for a Pool
(as used below) is to use the maximum number of processes available (i.e. the number of CPUs you have), and keep farming out new jobs to a worker as soon as a job completes. You won't get njobs
-way parallel, but you'll get as much parallelism that your CPUs can handle without oversubscribing your processors. I'm using pathos
, which has a fork of multiprocessing
because it's a bit more robust than standard multiprocessing
… and, well, I'm also the author. But you could probably use multiprocessing
for this.
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> class A(object):
... def __init__(self, njobs=1000):
... self.map = Pool().map
... self.njobs = njobs
... self.start()
... def start(self):
... self.result = self.map(self.RunProcess, range(self.njobs))
... return self.result
... def RunProcess(self, i):
... return i*i
...
>>> myA = A()
>>> myA.result[:11]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
>>> myA.njobs = 3
>>> myA.start()
[0, 1, 4]
It's a bit of an odd design to start the Pool
inside of __init__
. But if you want to do that, you have to get results from something like self.result
… and you can use self.start
for subsequent calls.
Get pathos
here: https://github.com/uqfoundation