I\'m trying to use multiprocessing
\'s Pool.map()
function to divide out work simultaneously. When I use the following code, it works fine:
There's another short-cut you can use, although it can be inefficient depending on what's in your class instances.
As everyone has said the problem is that the multiprocessing
code has to pickle the things that it sends to the sub-processes it has started, and the pickler doesn't do instance-methods.
However, instead of sending the instance-method, you can send the actual class instance, plus the name of the function to call, to an ordinary function that then uses getattr
to call the instance-method, thus creating the bound method in the Pool
subprocess. This is similar to defining a __call__
method except that you can call more than one member function.
Stealing @EricH.'s code from his answer and annotating it a bit (I retyped it hence all the name changes and such, for some reason this seemed easier than cut-and-paste :-) ) for illustration of all the magic:
import multiprocessing
import os
def call_it(instance, name, args=(), kwargs=None):
"indirect caller for instance methods and multiprocessing"
if kwargs is None:
kwargs = {}
return getattr(instance, name)(*args, **kwargs)
class Klass(object):
def __init__(self, nobj, workers=multiprocessing.cpu_count()):
print "Constructor (in pid=%d)..." % os.getpid()
self.count = 1
pool = multiprocessing.Pool(processes = workers)
async_results = [pool.apply_async(call_it,
args = (self, 'process_obj', (i,))) for i in range(nobj)]
pool.close()
map(multiprocessing.pool.ApplyResult.wait, async_results)
lst_results = [r.get() for r in async_results]
print lst_results
def __del__(self):
self.count -= 1
print "... Destructor (in pid=%d) count=%d" % (os.getpid(), self.count)
def process_obj(self, index):
print "object %d" % index
return "results"
Klass(nobj=8, workers=3)
The output shows that, indeed, the constructor is called once (in the original pid) and the destructor is called 9 times (once for each copy made = 2 or 3 times per pool-worker-process as needed, plus once in the original process). This is often OK, as in this case, since the default pickler makes a copy of the entire instance and (semi-) secretly re-populates it—in this case, doing:
obj = object.__new__(Klass)
obj.__dict__.update({'count':1})
—that's why even though the destructor is called eight times in the three worker processes, it counts down from 1 to 0 each time—but of course you can still get into trouble this way. If necessary, you can provide your own __setstate__
:
def __setstate__(self, adict):
self.count = adict['count']
in this case for instance.
Update: as of the day of this writing, namedTuples are pickable (starting with python 2.7)
The issue here is the child processes aren't able to import the class of the object -in this case, the class P-, in the case of a multi-model project the Class P should be importable anywhere the child process get used
a quick workaround is to make it importable by affecting it to globals()
globals()["P"] = P
In this simple case, where someClass.f
is not inheriting any data from the class and not attaching anything to the class, a possible solution would be to separate out f
, so it can be pickled:
import multiprocessing
def f(x):
return x*x
class someClass(object):
def __init__(self):
pass
def go(self):
pool = multiprocessing.Pool(processes=4)
print pool.map(f, range(10))
You could also define a __call__()
method inside your someClass()
, which calls someClass.go()
and then pass an instance of someClass()
to the pool. This object is pickleable and it works fine (for me)...
class someClass(object):
def __init__(self):
pass
def f(self, x):
return x*x
def go(self):
p = Pool(4)
sc = p.map(self, range(4))
print sc
def __call__(self, x):
return self.f(x)
sc = someClass()
sc.go()
Why not to use separate func?
def func(*args, **kwargs):
return inst.method(args, kwargs)
print pool.map(func, arr)
I ran into this same issue but found out that there is a JSON encoder that can be used to move these objects between processes.
from pyVmomi.VmomiSupport import VmomiJSONEncoder
Use this to create your list:
jsonSerialized = json.dumps(pfVmomiObj, cls=VmomiJSONEncoder)
Then in the mapped function, use this to recover the object:
pfVmomiObj = json.loads(jsonSerialized)