Python multiprocessing PicklingError: Can't pickle

前端 未结 8 612
故里飘歌
故里飘歌 2020-11-22 03:48

I am sorry that I can\'t reproduce the error with a simpler example, and my code is too complicated to post. If I run the program in IPython shell instead of the regular Pyt

相关标签:
8条回答
  • 2020-11-22 03:59

    I'd use pathos.multiprocesssing, instead of multiprocessing. pathos.multiprocessing is a fork of multiprocessing that uses dill. dill can serialize almost anything in python, so you are able to send a lot more around in parallel. The pathos fork also has the ability to work directly with multiple argument functions, as you need for class methods.

    >>> from pathos.multiprocessing import ProcessingPool as Pool
    >>> p = Pool(4)
    >>> class Test(object):
    ...   def plus(self, x, y): 
    ...     return x+y
    ... 
    >>> t = Test()
    >>> p.map(t.plus, x, y)
    [4, 6, 8, 10]
    >>> 
    >>> class Foo(object):
    ...   @staticmethod
    ...   def work(self, x):
    ...     return x+1
    ... 
    >>> f = Foo()
    >>> p.apipe(f.work, f, 100)
    <processing.pool.ApplyResult object at 0x10504f8d0>
    >>> res = _
    >>> res.get()
    101
    

    Get pathos (and if you like, dill) here: https://github.com/uqfoundation

    0 讨论(0)
  • 2020-11-22 03:59

    I have found that I can also generate exactly that error output on a perfectly working piece of code by attempting to use the profiler on it.

    Note that this was on Windows (where the forking is a bit less elegant).

    I was running:

    python -m profile -o output.pstats <script> 
    

    And found that removing the profiling removed the error and placing the profiling restored it. Was driving me batty too because I knew the code used to work. I was checking to see if something had updated pool.py... then had a sinking feeling and eliminated the profiling and that was it.

    Posting here for the archives in case anybody else runs into it.

    0 讨论(0)
  • 2020-11-22 04:00

    As others have said multiprocessing can only transfer Python objects to worker processes which can be pickled. If you cannot reorganize your code as described by unutbu, you can use dills extended pickling/unpickling capabilities for transferring data (especially code data) as I show below.

    This solution requires only the installation of dill and no other libraries as pathos:

    import os
    from multiprocessing import Pool
    
    import dill
    
    
    def run_dill_encoded(payload):
        fun, args = dill.loads(payload)
        return fun(*args)
    
    
    def apply_async(pool, fun, args):
        payload = dill.dumps((fun, args))
        return pool.apply_async(run_dill_encoded, (payload,))
    
    
    if __name__ == "__main__":
    
        pool = Pool(processes=5)
    
        # asyn execution of lambda
        jobs = []
        for i in range(10):
            job = apply_async(pool, lambda a, b: (a, b, a * b), (i, i + 1))
            jobs.append(job)
    
        for job in jobs:
            print job.get()
        print
    
        # async execution of static method
    
        class O(object):
    
            @staticmethod
            def calc():
                return os.getpid()
    
        jobs = []
        for i in range(10):
            job = apply_async(pool, O.calc, ())
            jobs.append(job)
    
        for job in jobs:
            print job.get()
    
    0 讨论(0)
  • 2020-11-22 04:04
    Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
    

    This error will also come if you have any inbuilt function inside the model object that was passed to the async job.

    So make sure to check the model objects that are passed doesn't have inbuilt functions. (In our case we were using FieldTracker() function of django-model-utils inside the model to track a certain field). Here is the link to relevant GitHub issue.

    0 讨论(0)
  • 2020-11-22 04:07

    This solution requires only the installation of dill and no other libraries as pathos

    def apply_packed_function_for_map((dumped_function, item, args, kwargs),):
        """
        Unpack dumped function as target function and call it with arguments.
    
        :param (dumped_function, item, args, kwargs):
            a tuple of dumped function and its arguments
        :return:
            result of target function
        """
        target_function = dill.loads(dumped_function)
        res = target_function(item, *args, **kwargs)
        return res
    
    
    def pack_function_for_map(target_function, items, *args, **kwargs):
        """
        Pack function and arguments to object that can be sent from one
        multiprocessing.Process to another. The main problem is:
            «multiprocessing.Pool.map*» or «apply*»
            cannot use class methods or closures.
        It solves this problem with «dill».
        It works with target function as argument, dumps it («with dill»)
        and returns dumped function with arguments of target function.
        For more performance we dump only target function itself
        and don't dump its arguments.
        How to use (pseudo-code):
    
            ~>>> import multiprocessing
            ~>>> images = [...]
            ~>>> pool = multiprocessing.Pool(100500)
            ~>>> features = pool.map(
            ~...     *pack_function_for_map(
            ~...         super(Extractor, self).extract_features,
            ~...         images,
            ~...         type='png'
            ~...         **options,
            ~...     )
            ~... )
            ~>>>
    
        :param target_function:
            function, that you want to execute like  target_function(item, *args, **kwargs).
        :param items:
            list of items for map
        :param args:
            positional arguments for target_function(item, *args, **kwargs)
        :param kwargs:
            named arguments for target_function(item, *args, **kwargs)
        :return: tuple(function_wrapper, dumped_items)
            It returs a tuple with
                * function wrapper, that unpack and call target function;
                * list of packed target function and its' arguments.
        """
        dumped_function = dill.dumps(target_function)
        dumped_items = [(dumped_function, item, args, kwargs) for item in items]
        return apply_packed_function_for_map, dumped_items
    

    It also works for numpy arrays.

    0 讨论(0)
  • 2020-11-22 04:11

    Building on @rocksportrocker solution, It would make sense to dill when sending and RECVing the results.

    import dill
    import itertools
    def run_dill_encoded(payload):
        fun, args = dill.loads(payload)
        res = fun(*args)
        res = dill.dumps(res)
        return res
    
    def dill_map_async(pool, fun, args_list,
                       as_tuple=True,
                       **kw):
        if as_tuple:
            args_list = ((x,) for x in args_list)
    
        it = itertools.izip(
            itertools.cycle([fun]),
            args_list)
        it = itertools.imap(dill.dumps, it)
        return pool.map_async(run_dill_encoded, it, **kw)
    
    if __name__ == '__main__':
        import multiprocessing as mp
        import sys,os
        p = mp.Pool(4)
        res = dill_map_async(p, lambda x:[sys.stdout.write('%s\n'%os.getpid()),x][-1],
                      [lambda x:x+1]*10,)
        res = res.get(timeout=100)
        res = map(dill.loads,res)
        print(res)
    
    0 讨论(0)
提交回复
热议问题