Python: Efficient workaround for multiprocessing a function that is a data member of a class, from within that class

前端 未结 3 1174
春和景丽
春和景丽 2020-12-09 06:36

I\'m aware of various discussions of limitations of the multiprocessing module when dealing with functions that are data members of a class (due to Pickling problems).

3条回答
  •  醉梦人生
    2020-12-09 07:07

    If you use a fork of multiprocessing called pathos.multiprocesssing, you can directly use classes and class methods in multiprocessing's map functions. This is because dill is used instead of pickle or cPickle, and dill can serialize almost anything in python.

    pathos.multiprocessing also provides an asynchronous map function… and it can map functions with multiple arguments (e.g. map(math.pow, [1,2,3], [4,5,6]))

    See: What can multiprocessing and dill do together?

    and: http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization/

    >>> from pathos.multiprocessing import ProcessingPool as Pool
    >>> 
    >>> p = Pool(4)
    >>> 
    >>> def add(x,y):
    ...   return x+y
    ... 
    >>> x = [0,1,2,3]
    >>> y = [4,5,6,7]
    >>> 
    >>> p.map(add, x, y)
    [4, 6, 8, 10]
    >>> 
    >>> class Test(object):
    ...   def plus(self, x, y): 
    ...     return x+y
    ... 
    >>> t = Test()
    >>> 
    >>> p.map(Test.plus, [t]*4, x, y)
    [4, 6, 8, 10]
    >>> 
    >>> p.map(t.plus, x, y)
    [4, 6, 8, 10]
    

    So you can do exactly what you wanted to do, I believe.

    Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
    [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import dill
    >>> 
    >>> class MyClass():
    ...   def __init__(self):
    ...     self.my_args = [1,2,3,4]
    ...     self.output = {}
    ...   def my_single_function(self, arg):
    ...     return arg**2
    ...   def my_parallelized_function(self):
    ...     res = p.map(self.my_single_function, self.my_args)
    ...     self.output = dict(zip(self.my_args, res))
    ... 
    >>> from pathos.multiprocessing import ProcessingPool as Pool
    >>> p = Pool()
    >>> 
    >>> foo = MyClass()
    >>> foo.my_parallelized_function()
    >>> foo.output
    {1: 1, 2: 4, 3: 9, 4: 16}
    >>>
    

    Get the code here: https://github.com/uqfoundation/pathos

提交回复
热议问题