Python: can't pickle module objects error

后端 未结 4 1801
不思量自难忘°
不思量自难忘° 2020-12-16 13:24

I\'m trying to pickle a big class and getting

TypeError: can\'t pickle module objects

despite looking around the web, I can\'t e

相关标签:
4条回答
  • 2020-12-16 13:48

    I can reproduce the error message this way:

    import cPickle
    
    class Foo(object):
        def __init__(self):
            self.mod=cPickle
    
    foo=Foo()
    with file('/tmp/test.out', 'w') as f:
        cPickle.dump(foo, f) 
    
    # TypeError: can't pickle module objects
    

    Do you have a class attribute that references a module?

    0 讨论(0)
  • 2020-12-16 13:50

    Python's inability to pickle module objects is the real problem. Is there a good reason? I don't think so. Having module objects unpicklable contributes to the frailty of python as a parallel / asynchronous language. If you want to pickle module objects, or almost anything in python, then use dill.

    Python 3.2.5 (default, May 19 2013, 14:25:55) 
    [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import dill
    >>> import os
    >>> dill.dumps(os)
    b'\x80\x03cdill.dill\n_import_module\nq\x00X\x02\x00\x00\x00osq\x01\x85q\x02Rq\x03.'
    >>>
    >>>
    >>> # and for parlor tricks...
    >>> class Foo(object):
    ...   x = 100
    ...   def __call__(self, f):
    ...     def bar(y):
    ...       return f(self.x) + y
    ...     return bar
    ... 
    >>> @Foo()
    ... def do_thing(x):
    ...   return x
    ... 
    >>> do_thing(3)
    103 
    >>> dill.loads(dill.dumps(do_thing))(3)
    103
    >>> 
    

    Get dill here: https://github.com/uqfoundation/dill

    0 讨论(0)
  • 2020-12-16 14:00

    Recursively Find Pickle Failure

    Inspired by wump's comment: Python: can't pickle module objects error

    Here is some quick code that helped me find the culprit recursively.

    It checks the object in question to see if it fails pickling.

    Then iterates trying to pickle the keys in __dict__ returning the list of only failed picklings.

    Code Snippet

    import pickle
    
    def pickle_trick(obj, max_depth=10):
        output = {}
    
        if max_depth <= 0:
            return output
    
        try:
            pickle.dumps(obj)
        except (pickle.PicklingError, TypeError) as e:
            failing_children = []
    
            if hasattr(obj, "__dict__"):
                for k, v in obj.__dict__.items():
                    result = pickle_trick(v, max_depth=max_depth - 1)
                    if result:
                        failing_children.append(result)
    
            output = {
                "fail": obj, 
                "err": e, 
                "depth": max_depth, 
                "failing_children": failing_children
            }
    
        return output
    
    

    Example Program

    import redis
    
    import pickle
    from pprint import pformat as pf
    
    
    def pickle_trick(obj, max_depth=10):
        output = {}
    
        if max_depth <= 0:
            return output
    
        try:
            pickle.dumps(obj)
        except (pickle.PicklingError, TypeError) as e:
            failing_children = []
    
            if hasattr(obj, "__dict__"):
                for k, v in obj.__dict__.items():
                    result = pickle_trick(v, max_depth=max_depth - 1)
                    if result:
                        failing_children.append(result)
    
            output = {
                "fail": obj, 
                "err": e, 
                "depth": max_depth, 
                "failing_children": failing_children
            }
    
        return output
    
    
    if __name__ == "__main__":
        r = redis.Redis()
        print(pf(pickle_trick(r)))
    
    

    Example Output

    $ python3 pickle-trick.py
    {'depth': 10,
     'err': TypeError("can't pickle _thread.lock objects"),
     'fail': Redis<ConnectionPool<Connection<host=localhost,port=6379,db=0>>>,
     'failing_children': [{'depth': 9,
                           'err': TypeError("can't pickle _thread.lock objects"),
                           'fail': ConnectionPool<Connection<host=localhost,port=6379,db=0>>,
                           'failing_children': [{'depth': 8,
                                                 'err': TypeError("can't pickle _thread.lock objects"),
                                                 'fail': <unlocked _thread.lock object at 0x10bb58300>,
                                                 'failing_children': []},
                                                {'depth': 8,
                                                 'err': TypeError("can't pickle _thread.RLock objects"),
                                                 'fail': <unlocked _thread.RLock object owner=0 count=0 at 0x10bb58150>,
                                                 'failing_children': []}]},
                          {'depth': 9,
                           'err': PicklingError("Can't pickle <function Redis.<lambda> at 0x10c1e8710>: attribute lookup Redis.<lambda> on redis.client failed"),
                           'fail': {'ACL CAT': <function Redis.<lambda> at 0x10c1e89e0>,
                                    'ACL DELUSER': <class 'int'>,
    0x10c1e8170>,
                                    .........
                                    'ZSCORE': <function float_or_none at 0x10c1e5d40>},
                           'failing_children': []}]}
    

    Root Cause - Redis can't pickle _thread.lock

    In my case, creating an instance of Redis that I saved as an attribute of an object broke pickling.

    When you create an instance of Redis it also creates a connection_pool of Threads and the thread locks can not be pickled.

    I had to create and clean up Redis within the multiprocessing.Process before it was pickled.

    Testing

    In my case, the class that I was trying to pickle, must be able to pickle. So I added a unit test that creates an instance of the class and pickles it. That way if anyone modifies the class so it can't be pickled, therefore breaking it's ability to be used in multiprocessing (and pyspark), we will detect that regression and know straight away.

    def test_can_pickle():
        # Given
        obj = MyClassThatMustPickle()
    
        # When / Then
        pkl = pickle.dumps(obj)
    
        # This test will throw an error if it is no longer pickling correctly
    
    
    0 讨论(0)
  • 2020-12-16 14:03

    According to the documentation:

    What can be pickled and unpickled?

    The following types can be pickled:

    • None, True, and False

    • integers, floating point numbers, complex numbers

    • strings, bytes, bytearrays

    • tuples, lists, sets, and dictionaries containing only picklable objects

    • functions defined at the top level of a module (using def, not lambda)

    • built-in functions defined at the top level of a module

    • classes that are defined at the top level of a module

    • instances of such classes whose __dict__ or the result of calling __getstate__() is picklable (see section Pickling Class Instances for details).

    As you can see, modules are not part of this list. Note, that this is also true when using deepcopy and not only for the pickle module, as stated in the documentation of deepcopy:

    This module does not copy types like module, method, stack trace, stack frame, file, socket, window, array, or any similar types. It does “copy” functions and classes (shallow and deeply), by returning the original object unchanged; this is compatible with the way these are treated by the pickle module.

    A possible workaround is using the @property decorator instead of an attribute. For example, this should work:

        import numpy as np
        import pickle
    
        class Foo():
            @property
            def module(self):
                return np
    
        foo = Foo()
        with open('test.out', 'wb') as f:
            pickle.dump(foo, f)
    
    0 讨论(0)
提交回复
热议问题