Python: Efficient workaround for multiprocessing a function that is a data member of a class, from within that class

前端 未结 3 1175
春和景丽
春和景丽 2020-12-09 06:36

I\'m aware of various discussions of limitations of the multiprocessing module when dealing with functions that are data members of a class (due to Pickling problems).

相关标签:
3条回答
  • 2020-12-09 07:02

    Steven Bethard has posted a way to allow methods to be pickled/unpickled. You could use it like this:

    import multiprocessing as mp
    import copy_reg
    import types
    
    def _pickle_method(method):
        # Author: Steven Bethard
        # http://bytes.com/topic/python/answers/552476-why-cant-you-pickle-instancemethods
        func_name = method.im_func.__name__
        obj = method.im_self
        cls = method.im_class
        cls_name = ''
        if func_name.startswith('__') and not func_name.endswith('__'):
            cls_name = cls.__name__.lstrip('_')
        if cls_name:
            func_name = '_' + cls_name + func_name
        return _unpickle_method, (func_name, obj, cls)
    
    def _unpickle_method(func_name, obj, cls):
        # Author: Steven Bethard
        # http://bytes.com/topic/python/answers/552476-why-cant-you-pickle-instancemethods
        for cls in cls.mro():
            try:
                func = cls.__dict__[func_name]
            except KeyError:
                pass
            else:
                break
        return func.__get__(obj, cls)
    
    # This call to copy_reg.pickle allows you to pass methods as the first arg to
    # mp.Pool methods. If you comment out this line, `pool.map(self.foo, ...)` results in
    # PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup
    # __builtin__.instancemethod failed
    
    copy_reg.pickle(types.MethodType, _pickle_method, _unpickle_method)
    
    class MyClass(object):
    
        def __init__(self):
            self.my_args = [1,2,3,4]
            self.output  = {}
    
        def my_single_function(self, arg):
            return arg**2
    
        def my_parallelized_function(self):
            # Use map or map_async to map my_single_function onto the
            # list of self.my_args, and append the return values into
            # self.output, using each arg in my_args as the key.
    
            # The result should make self.output become
            # {1:1, 2:4, 3:9, 4:16}
            self.output = dict(zip(self.my_args,
                                   pool.map(self.my_single_function, self.my_args)))
    

    Then

    pool = mp.Pool()   
    foo = MyClass()
    foo.my_parallelized_function()
    

    yields

    print foo.output
    # {1: 1, 2: 4, 3: 9, 4: 16}
    
    0 讨论(0)
  • 2020-12-09 07:03

    There is a better elegant solution i believe. Add the following line to a code that does multiprocessing with the class and you can still pass the method through the pool. the codes should go above the class

    import copy_reg
        import types
    
        def _reduce_method(meth):
            return (getattr,(meth.__self__,meth.__func__.__name__))
        copy_reg.pickle(types.MethodType,_reduce_method)
    

    for more understanding of how to pickle a method please see below http://docs.python.org/2/library/copy_reg.html

    0 讨论(0)
  • 2020-12-09 07:07

    If you use a fork of multiprocessing called pathos.multiprocesssing, you can directly use classes and class methods in multiprocessing's map functions. This is because dill is used instead of pickle or cPickle, and dill can serialize almost anything in python.

    pathos.multiprocessing also provides an asynchronous map function… and it can map functions with multiple arguments (e.g. map(math.pow, [1,2,3], [4,5,6]))

    See: What can multiprocessing and dill do together?

    and: http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization/

    >>> from pathos.multiprocessing import ProcessingPool as Pool
    >>> 
    >>> p = Pool(4)
    >>> 
    >>> def add(x,y):
    ...   return x+y
    ... 
    >>> x = [0,1,2,3]
    >>> y = [4,5,6,7]
    >>> 
    >>> p.map(add, x, y)
    [4, 6, 8, 10]
    >>> 
    >>> class Test(object):
    ...   def plus(self, x, y): 
    ...     return x+y
    ... 
    >>> t = Test()
    >>> 
    >>> p.map(Test.plus, [t]*4, x, y)
    [4, 6, 8, 10]
    >>> 
    >>> p.map(t.plus, x, y)
    [4, 6, 8, 10]
    

    So you can do exactly what you wanted to do, I believe.

    Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
    [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import dill
    >>> 
    >>> class MyClass():
    ...   def __init__(self):
    ...     self.my_args = [1,2,3,4]
    ...     self.output = {}
    ...   def my_single_function(self, arg):
    ...     return arg**2
    ...   def my_parallelized_function(self):
    ...     res = p.map(self.my_single_function, self.my_args)
    ...     self.output = dict(zip(self.my_args, res))
    ... 
    >>> from pathos.multiprocessing import ProcessingPool as Pool
    >>> p = Pool()
    >>> 
    >>> foo = MyClass()
    >>> foo.my_parallelized_function()
    >>> foo.output
    {1: 1, 2: 4, 3: 9, 4: 16}
    >>>
    

    Get the code here: https://github.com/uqfoundation/pathos

    0 讨论(0)
提交回复
热议问题