Python multiprocessing PicklingError: Can't pickle

前端 未结 8 603
故里飘歌
故里飘歌 2020-11-22 03:48

I am sorry that I can\'t reproduce the error with a simpler example, and my code is too complicated to post. If I run the program in IPython shell instead of the regular Pyt

相关标签:
8条回答
  • 2020-11-22 04:18

    Here is a list of what can be pickled. In particular, functions are only picklable if they are defined at the top-level of a module.

    This piece of code:

    import multiprocessing as mp
    
    class Foo():
        @staticmethod
        def work(self):
            pass
    
    if __name__ == '__main__':   
        pool = mp.Pool()
        foo = Foo()
        pool.apply_async(foo.work)
        pool.close()
        pool.join()
    

    yields an error almost identical to the one you posted:

    Exception in thread Thread-2:
    Traceback (most recent call last):
      File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
        self.run()
      File "/usr/lib/python2.7/threading.py", line 505, in run
        self.__target(*self.__args, **self.__kwargs)
      File "/usr/lib/python2.7/multiprocessing/pool.py", line 315, in _handle_tasks
        put(task)
    PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
    

    The problem is that the pool methods all use a mp.SimpleQueue to pass tasks to the worker processes. Everything that goes through the mp.SimpleQueue must be pickable, and foo.work is not picklable since it is not defined at the top level of the module.

    It can be fixed by defining a function at the top level, which calls foo.work():

    def work(foo):
        foo.work()
    
    pool.apply_async(work,args=(foo,))
    

    Notice that foo is pickable, since Foo is defined at the top level and foo.__dict__ is picklable.

    0 讨论(0)
  • 2020-11-22 04:24

    When this problem comes up with multiprocessing a simple solution is to switch from Pool to ThreadPool. This can be done with no change of code other than the import-

    from multiprocessing.pool import ThreadPool as Pool
    

    This works because ThreadPool shares memory with the main thread, rather than creating a new process- this means that pickling is not required.

    The downside to this method is that python isn't the greatest language with handling threads- it uses something called the Global Interpreter Lock to stay thread safe, which can slow down some use cases here. However, if you're primarily interacting with other systems (running HTTP commands, talking with a database, writing to filesystems) then your code is likely not bound by CPU and won't take much of a hit. In fact I've found when writing HTTP/HTTPS benchmarks that the threaded model used here has less overhead and delays, as the overhead from creating new processes is much higher than the overhead for creating new threads.

    So if you're processing a ton of stuff in python userspace this might not be the best method.

    0 讨论(0)
提交回复
热议问题