Can't pickle when using multiprocessing Pool.map()

后端 未结 12 1772
醉梦人生
醉梦人生 2020-11-22 00:19

I\'m trying to use multiprocessing\'s Pool.map() function to divide out work simultaneously. When I use the following code, it works fine:

12条回答
  •  [愿得一人]
    2020-11-22 01:06

    There's another short-cut you can use, although it can be inefficient depending on what's in your class instances.

    As everyone has said the problem is that the multiprocessing code has to pickle the things that it sends to the sub-processes it has started, and the pickler doesn't do instance-methods.

    However, instead of sending the instance-method, you can send the actual class instance, plus the name of the function to call, to an ordinary function that then uses getattr to call the instance-method, thus creating the bound method in the Pool subprocess. This is similar to defining a __call__ method except that you can call more than one member function.

    Stealing @EricH.'s code from his answer and annotating it a bit (I retyped it hence all the name changes and such, for some reason this seemed easier than cut-and-paste :-) ) for illustration of all the magic:

    import multiprocessing
    import os
    
    def call_it(instance, name, args=(), kwargs=None):
        "indirect caller for instance methods and multiprocessing"
        if kwargs is None:
            kwargs = {}
        return getattr(instance, name)(*args, **kwargs)
    
    class Klass(object):
        def __init__(self, nobj, workers=multiprocessing.cpu_count()):
            print "Constructor (in pid=%d)..." % os.getpid()
            self.count = 1
            pool = multiprocessing.Pool(processes = workers)
            async_results = [pool.apply_async(call_it,
                args = (self, 'process_obj', (i,))) for i in range(nobj)]
            pool.close()
            map(multiprocessing.pool.ApplyResult.wait, async_results)
            lst_results = [r.get() for r in async_results]
            print lst_results
    
        def __del__(self):
            self.count -= 1
            print "... Destructor (in pid=%d) count=%d" % (os.getpid(), self.count)
    
        def process_obj(self, index):
            print "object %d" % index
            return "results"
    
    Klass(nobj=8, workers=3)
    

    The output shows that, indeed, the constructor is called once (in the original pid) and the destructor is called 9 times (once for each copy made = 2 or 3 times per pool-worker-process as needed, plus once in the original process). This is often OK, as in this case, since the default pickler makes a copy of the entire instance and (semi-) secretly re-populates it—in this case, doing:

    obj = object.__new__(Klass)
    obj.__dict__.update({'count':1})
    

    —that's why even though the destructor is called eight times in the three worker processes, it counts down from 1 to 0 each time—but of course you can still get into trouble this way. If necessary, you can provide your own __setstate__:

        def __setstate__(self, adict):
            self.count = adict['count']
    

    in this case for instance.

提交回复
热议问题