Can't pickle when using multiprocessing Pool.map()

后端未结

关注

 12  1777

I\'m trying to use multiprocessing\'s Pool.map() function to divide out work simultaneously. When I use the following code, it works fine:

相关标签:

12条回答

[愿得一人]

2020-11-22 01:06
There's another short-cut you can use, although it can be inefficient depending on what's in your class instances.

As everyone has said the problem is that the multiprocessing code has to pickle the things that it sends to the sub-processes it has started, and the pickler doesn't do instance-methods.

However, instead of sending the instance-method, you can send the actual class instance, plus the name of the function to call, to an ordinary function that then uses getattr to call the instance-method, thus creating the bound method in the Pool subprocess. This is similar to defining a __call__ method except that you can call more than one member function.

Stealing @EricH.'s code from his answer and annotating it a bit (I retyped it hence all the name changes and such, for some reason this seemed easier than cut-and-paste :-) ) for illustration of all the magic:
```
import multiprocessing
import os

def call_it(instance, name, args=(), kwargs=None):
    "indirect caller for instance methods and multiprocessing"
    if kwargs is None:
        kwargs = {}
    return getattr(instance, name)(*args, **kwargs)

class Klass(object):
    def __init__(self, nobj, workers=multiprocessing.cpu_count()):
        print "Constructor (in pid=%d)..." % os.getpid()
        self.count = 1
        pool = multiprocessing.Pool(processes = workers)
        async_results = [pool.apply_async(call_it,
            args = (self, 'process_obj', (i,))) for i in range(nobj)]
        pool.close()
        map(multiprocessing.pool.ApplyResult.wait, async_results)
        lst_results = [r.get() for r in async_results]
        print lst_results

    def __del__(self):
        self.count -= 1
        print "... Destructor (in pid=%d) count=%d" % (os.getpid(), self.count)

    def process_obj(self, index):
        print "object %d" % index
        return "results"

Klass(nobj=8, workers=3)
```
The output shows that, indeed, the constructor is called once (in the original pid) and the destructor is called 9 times (once for each copy made = 2 or 3 times per pool-worker-process as needed, plus once in the original process). This is often OK, as in this case, since the default pickler makes a copy of the entire instance and (semi-) secretly re-populates it—in this case, doing:
```
obj = object.__new__(Klass)
obj.__dict__.update({'count':1})
```
—that's why even though the destructor is called eight times in the three worker processes, it counts down from 1 to 0 each time—but of course you can still get into trouble this way. If necessary, you can provide your own __setstate__:
```
    def __setstate__(self, adict):
        self.count = adict['count']
```
in this case for instance.
0 讨论(0)
发布评论:

提交评论
- 加载中...
旧巷少年郎

2020-11-22 01:06
Update: as of the day of this writing, namedTuples are pickable (starting with python 2.7)

The issue here is the child processes aren't able to import the class of the object -in this case, the class P-, in the case of a multi-model project the Class P should be importable anywhere the child process get used

a quick workaround is to make it importable by affecting it to globals()
```
globals()["P"] = P
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
忘掉有多难

2020-11-22 01:07
In this simple case, where someClass.f is not inheriting any data from the class and not attaching anything to the class, a possible solution would be to separate out f, so it can be pickled:
```
import multiprocessing


def f(x):
    return x*x


class someClass(object):
    def __init__(self):
        pass

    def go(self):
        pool = multiprocessing.Pool(processes=4)       
        print pool.map(f, range(10))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
春和景丽

2020-11-22 01:10
You could also define a __call__() method inside your someClass(), which calls someClass.go() and then pass an instance of someClass() to the pool. This object is pickleable and it works fine (for me)...
```
class someClass(object):
   def __init__(self):
       pass
   def f(self, x):
       return x*x

   def go(self):
      p = Pool(4)
      sc = p.map(self, range(4))
      print sc

   def __call__(self, x):   
     return self.f(x)

sc = someClass()
sc.go()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉酒成梦

2020-11-22 01:13
Why not to use separate func?
```
def func(*args, **kwargs):
    return inst.method(args, kwargs)

print pool.map(func, arr)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
小鲜肉

2020-11-22 01:22
I ran into this same issue but found out that there is a JSON encoder that can be used to move these objects between processes.
```
from pyVmomi.VmomiSupport import VmomiJSONEncoder
```
Use this to create your list:
```
jsonSerialized = json.dumps(pfVmomiObj, cls=VmomiJSONEncoder)
```
Then in the mapped function, use this to recover the object:
```
pfVmomiObj = json.loads(jsonSerialized)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2