问题
After a class definition is updated by recompiling a script, pickle refuses to serialize previously instantiated objects of that class, giving the error: "Can't pickle object: it's not the same object as "
Is there a way to tell pickle that it should ignore such cases? To just identify classes by name, ignore whichever internal unique ID is causing the mismatch?
I would definitely welcome as an answer the suggestion of an alternative, equivalent module which solves this problem in a convenient and robust manner.
For reference, here's my motivation:
I am creating a high productivity, rapid iteration development environment in which Python scripts are edited live. Scripts are repeatedly recompiled, but data persists across compiles. As part of the productivity goals, I am trying to use pickle for serialization, to avoid the cost of writing and updating explicit serialization code for constantly changing data structures.
Mostly I serialize built-in types. I am careful to avoid meaningful changes in the classes which I pickle, and when necessary I use the copy_reg.pickle mechanism to perform upconversion on unpickle.
Script recompilation prevents me from pickling objects at all, even if class definitions have not actually changed (or have only changed in a benign way).
回答1:
Unless you can unpack the earlier version of the class definition, the reference pickle needs to dump and load the instance is now gone. So this is "not possible".
However, if you did want to do it, you could save previous versions of your class definitions... and then it would just be that you'd have to trick pickle into referring to your old/saved class definitions, and not using the most current ones -- which might just amount to editing obj.__class__
or obj.__module__
to point to your old class. There may also be some other odd things in your class instance that also refer to the old class definition that you'd have to handle. Also, if you add or delete a class method, you may run in to some unexpected results, or have to deal with updating the instance accordingly. Another interesting twist is that you could make the unpickler always use the most current version of your class.
My serialization package, dill, has some methods that can dump compiled source from a live code object to a temporary file, and then serialize using that temporary file. It's one of the newer parts of the package, so it's not as robust as the rest of dill. Also, your use case is not a use case I'd considered, but I could see how it would be a nice feature to have.
回答2:
There is a simple way to do it that is basically User's answer.
First I will give the failing code:
#Tested with Python 3.6.7
import pickle
class Foo:
pass
foo = Foo()
class Foo:
def bar(self):
return 0
pickle.dumps(foo) #raises PicklingError: Can't pickle <class '__main__.Foo'>: it's not the same object as __main__.Foo
To fix this problem, just reset the __class__
attribute of foo
before pickling as in User's answer:
import pickle
class Foo:
pass
foo = Foo()
class Foo:
def bar(self):
return 0
foo.__class__ = eval(foo.__class__.__name__) #reset __class__ attribute
pickle.dumps(foo) #works fine
This solution only works if you truly want pickle to ignore any differences between the two versions of the class. If the two versions have significant differences, I don't expect this solution to work.
回答3:
Two solutions come into my mind:
before you pickle you can set
object.__class__
>>> class X(object): pass >>> class Y(object): pass >>> x = X() >>> x.__class__ = Y >>> type(x) <class '__main__.Y'>
Maybe you can use
persistent_id
for this because every object is passed to it.define
__reduce__
to do the exact same as pickle does. (have a look at pickle.py for this)
来源:https://stackoverflow.com/questions/16269071/python-pickle-dealing-with-updated-class-definitions