AttributeError when reading a pickle file

后端 未结 1 604
迷失自我
迷失自我 2020-12-28 20:18

I get the following error when I\'m reading my .pkl files on spyder (python 3.6.5):

IN: with open(file, \"rb\") as f:
       data = pickle.load(f)  

Traceba         


        
相关标签:
1条回答
  • 2020-12-28 20:33

    When you dump stuff in a pickle you should avoid pickling classes and functions declared in the main module. Your problem is (in part) because you only have one file in your program. pickle is lazy and does not serialize class definitions or function definitions. Instead it saves a reference of how to find the class (the module it lives in and its name).

    When python runs a script/file directly it runs the program as the __main__ module (regardless of its actual file name). However, when a file is loaded and is not the main module (eg. when you do something like import program) then its module name is based on its name. So program.py gets called program.

    When you are running from the command line you are doing the former, and the module is called __main__. As such, pickle creates references to your classes like __main__.Signal. When spyder tries to load the pickle file it gets told to import __main__ and look for Signal. But, spyder's __main__ module is the module that is used to start spyder and not your program.py and so pickle fails to find Signal.

    You can inspect the contents of a pickle file by running (-a is prints a description of each command). From this you will see that your class is being referenced as __main__.Signal.

    python -m pickletools -a file.pkl
    

    And you'll see something like:

        0: \x80 PROTO      3              Protocol version indicator.
        2: c    GLOBAL     '__main__ Signal' Push a global object (module.attr) on the stack.
       19: q    BINPUT     0                 Store the stack top into the memo.  The stack is not popped.
       21: )    EMPTY_TUPLE                  Push an empty tuple.
       22: \x81 NEWOBJ                       Build an object instance.
       23: q    BINPUT     1                 Store the stack top into the memo.  The stack is not popped.
       ...
       51: b    BUILD                        Finish building an object, via __setstate__ or dict update.
       52: .    STOP                         Stop the unpickling machine.
    highest protocol among opcodes = 2
    

    Solutions

    There are a number of solutions available to you:

    1. Don't serialise instances of classes that are defined in your __main__ module. The easiest and best solution. Instead move these classes to another module, or write a main.py script to invoke your program (both will mean such classes are no longer found in the __main__ module).
    2. Write a custom derserialiser
    3. Write a custom serialiser

    The following solutions will be working with a pickle file called out.pkl created by the following code (in a file called program.py):

    import pickle
    
    class MyClass:
        def __init__(self, name):
            self.name = name
    
    if __name__ == '__main__':
        o = MyClass('test')
        with open('out.pkl', 'wb') as f:
            pickle.dump(o, f)
    

    The Custom Deserialiser Solution

    You can write a customer deserialiser that knows when it encounters a reference to the __main__ module what you really mean is the program module.

    import pickle
    
    class MyCustomUnpickler(pickle.Unpickler):
        def find_class(self, module, name):
            if module == "__main__":
                module = "program"
            return super().find_class(module, name)
    
    with open('out.pkl', 'rb') as f:
        unpickler = MyCustomUnpickler(f)
        obj = unpickler.load()
    
    print(obj)
    print(obj.name)
    

    This is the easiest way to load pickle files that have already been created. The program is that it pushes the responsibility on to the deserialising code, when it should really be the responsibility of the serialising code to create pickle files correctly.

    The Custom Serialisation Solution

    In contrast to the previous solution you can make sure that serialised pickle objects can be deserialised easily by anyone without having to know the custom deserialisation logic. To do this you can use the copyreg module to inform pickle how to deserialise various classes. So here, what you would do is tell pickle to deserialise all instances of __main__ classes as if they were instances of program classes. You will need to register a custom serialiser for each class

    import program
    import pickle
    import copyreg
    
    class MyClass:
        def __init__(self, name):
            self.name = name
    
    def pickle_MyClass(obj):
        assert type(obj) is MyClass
        return program.MyClass, (obj.name,)
    
    copyreg.pickle(MyClass, pickle_MyClass)
    
    if __name__ == '__main__':
        o = MyClass('test')
        with open('out.pkl', 'wb') as f:
            pickle.dump(o, f)
    
    0 讨论(0)
提交回复
热议问题