问题
I'm doing a fair amount of parallel processing in Python using the multiprocessing module. I know certain objects CAN be pickle (thus passed as arguments in multi-p) and others can't. E.g.
class abc():
pass
a=abc()
pickle.dumps(a)
'ccopy_reg\n_reconstructor\np1\n(c__main__\nabc\np2\nc__builtin__\nobject\np3\nNtRp4\n.'
But I have some larger classes in my code (a dozen methods, or so), and this happens:
a=myBigClass()
pickle.dumps(a)
Traceback (innermost last):
File "<stdin>", line 1, in <module>
File "/usr/apps/Python279/python-2.7.9-rhel5-x86_64/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle file objects
It's not a file object, but at other times, I'll get other messages that say basically: "I can't pickle this".
So what's the rule? Number of bytes? Depth of hierarchy? Phase of the moon?
回答1:
I'm the dill
author. There's a fairly comprehensive list of what pickles and what doesn't as part of dill
. It can be run per version of Python 2.5–3.4, and adjusted for what pickles with dill
or what pickles with pickle
by changing one flag. See here and here.
The root of the rules for what pickles is (off the top of my head):
- Can you capture the state of the object by reference (i.e. a
function defined in
__main__
versus an imported function)? [Then, yes] - Does a generic
__getstate__
and__setstate__
rule exist for the given object type? [Then, yes] - Does it depend on a
Frame
object (i.e. rely on the GIL and global execution stack)? Iterators are now an exception to this, by "replaying" the iterator on unpickling. [Then, no] - Does the object instance point to the wrong class path (i.e. due to being defined in a closure, in C-bindings, or other
__init__
path manipulations)? [Then, no] - Is it considered dangerous by Python to allow this? [Then, no]
So, (5) is less prevalent now than it used to be, but still has some lasting effects in the language for pickle
. dill
, for the most part, removes (1), (2), and (5) – but is still fairly effected by (3) and (4).
I might be forgetting something else, but I think in general those are the underlying rules.
Certain modules like multiprocessing
register some objects that are important for their functioning. dill
registers the majority of objects in the language.
The dill
fork of multiprocessing
is required because multiprocessing
uses cPickle
, and dill
can only augment the pure-Python pickling registry. You could, if you have the patience, go through all the relevant copy_reg
functions in dill
, and apply them to the cPickle
module and you'd get a much more pickle-capable multiprocessing
. I've found a simple (read: one liner) way to do this for pickle
, but not cPickle
.
回答2:
From the docs:
The following types can be pickled:
None
,True
, andFalse
- integers, long integers, floating point numbers, complex numbers
- normal and Unicode strings
- tuples, lists, sets, and dictionaries containing only picklable objects
- functions defined at the top level of a module
- built-in functions defined at the top level of a module
- classes that are defined at the top level of a module
- instances of such classes whose
__dict__
or the result of calling__getstate__()
is picklable (see section The pickle protocol for details).Attempts to pickle unpicklable objects will raise the
PicklingError
exception; when this happens, an unspecified number of bytes may have already been written to the underlying file. Trying to pickle a highly recursive data structure may exceed the maximum recursion depth, aRuntimeError
will be raised in this case. You can carefully raise this limit withsys.setrecursionlimit()
.
回答3:
The general rule of thumb is that "logical" objects can be pickled, but "resource" objects (files, locks) can't, because it makes no sense to persist/clone them.
回答4:
In addition to icedtrees' answer, also coming straight from the docs, you can customize and control how class instances are pickled and unpicked, using the special methods: object.__getnewargs_ex__()
, object.__getnewargs__()
, object.__getstate__()
, object.__setstate__(state)
来源:https://stackoverflow.com/questions/29922373/when-can-a-python-object-be-pickled