Can't pickle when using multiprocessing Pool.map()

后端 未结 12 1777
醉梦人生
醉梦人生 2020-11-22 00:19

I\'m trying to use multiprocessing\'s Pool.map() function to divide out work simultaneously. When I use the following code, it works fine:

相关标签:
12条回答
  • 2020-11-22 01:06

    There's another short-cut you can use, although it can be inefficient depending on what's in your class instances.

    As everyone has said the problem is that the multiprocessing code has to pickle the things that it sends to the sub-processes it has started, and the pickler doesn't do instance-methods.

    However, instead of sending the instance-method, you can send the actual class instance, plus the name of the function to call, to an ordinary function that then uses getattr to call the instance-method, thus creating the bound method in the Pool subprocess. This is similar to defining a __call__ method except that you can call more than one member function.

    Stealing @EricH.'s code from his answer and annotating it a bit (I retyped it hence all the name changes and such, for some reason this seemed easier than cut-and-paste :-) ) for illustration of all the magic:

    import multiprocessing
    import os
    
    def call_it(instance, name, args=(), kwargs=None):
        "indirect caller for instance methods and multiprocessing"
        if kwargs is None:
            kwargs = {}
        return getattr(instance, name)(*args, **kwargs)
    
    class Klass(object):
        def __init__(self, nobj, workers=multiprocessing.cpu_count()):
            print "Constructor (in pid=%d)..." % os.getpid()
            self.count = 1
            pool = multiprocessing.Pool(processes = workers)
            async_results = [pool.apply_async(call_it,
                args = (self, 'process_obj', (i,))) for i in range(nobj)]
            pool.close()
            map(multiprocessing.pool.ApplyResult.wait, async_results)
            lst_results = [r.get() for r in async_results]
            print lst_results
    
        def __del__(self):
            self.count -= 1
            print "... Destructor (in pid=%d) count=%d" % (os.getpid(), self.count)
    
        def process_obj(self, index):
            print "object %d" % index
            return "results"
    
    Klass(nobj=8, workers=3)
    

    The output shows that, indeed, the constructor is called once (in the original pid) and the destructor is called 9 times (once for each copy made = 2 or 3 times per pool-worker-process as needed, plus once in the original process). This is often OK, as in this case, since the default pickler makes a copy of the entire instance and (semi-) secretly re-populates it—in this case, doing:

    obj = object.__new__(Klass)
    obj.__dict__.update({'count':1})
    

    —that's why even though the destructor is called eight times in the three worker processes, it counts down from 1 to 0 each time—but of course you can still get into trouble this way. If necessary, you can provide your own __setstate__:

        def __setstate__(self, adict):
            self.count = adict['count']
    

    in this case for instance.

    0 讨论(0)
  • 2020-11-22 01:06

    Update: as of the day of this writing, namedTuples are pickable (starting with python 2.7)

    The issue here is the child processes aren't able to import the class of the object -in this case, the class P-, in the case of a multi-model project the Class P should be importable anywhere the child process get used

    a quick workaround is to make it importable by affecting it to globals()

    globals()["P"] = P
    
    0 讨论(0)
  • 2020-11-22 01:07

    In this simple case, where someClass.f is not inheriting any data from the class and not attaching anything to the class, a possible solution would be to separate out f, so it can be pickled:

    import multiprocessing
    
    
    def f(x):
        return x*x
    
    
    class someClass(object):
        def __init__(self):
            pass
    
        def go(self):
            pool = multiprocessing.Pool(processes=4)       
            print pool.map(f, range(10))
    
    0 讨论(0)
  • 2020-11-22 01:10

    You could also define a __call__() method inside your someClass(), which calls someClass.go() and then pass an instance of someClass() to the pool. This object is pickleable and it works fine (for me)...

    class someClass(object):
       def __init__(self):
           pass
       def f(self, x):
           return x*x
    
       def go(self):
          p = Pool(4)
          sc = p.map(self, range(4))
          print sc
    
       def __call__(self, x):   
         return self.f(x)
    
    sc = someClass()
    sc.go()
    
    0 讨论(0)
  • 2020-11-22 01:13

    Why not to use separate func?

    def func(*args, **kwargs):
        return inst.method(args, kwargs)
    
    print pool.map(func, arr)
    
    0 讨论(0)
  • 2020-11-22 01:22

    I ran into this same issue but found out that there is a JSON encoder that can be used to move these objects between processes.

    from pyVmomi.VmomiSupport import VmomiJSONEncoder
    

    Use this to create your list:

    jsonSerialized = json.dumps(pfVmomiObj, cls=VmomiJSONEncoder)
    

    Then in the mapped function, use this to recover the object:

    pfVmomiObj = json.loads(jsonSerialized)
    
    0 讨论(0)
提交回复
热议问题