Dill is obviously a very useful module, and it seems as long as you manage the files carefully it is relatively safe. But I was put off by the statement:
Yes
Because Pickle allows you to override the object serialization and deserialization, via
object.__getstate__()
Classes can further influence how their instances are pickled; if the class defines the method
__getstate__()
, it is called and the returned object is pickled as the contents for the instance, instead of the contents of the instance’s dictionary. If the__getstate__()
method is absent, the instance’s__dict__
is pickled as usual.
object.__setstate__(state)
Upon unpickling, if the class defines
__setstate__()
, it is called with the unpickled state. In that case, there is no requirement for the state object to be a dictionary. Otherwise, the pickled state must be a dictionary and its items are assigned to the new instance’s dictionary.
Because these functions can execute arbitrary code at the user's permission level, it is relatively easy to write a malicious deserializer -- e.g. one that deletes all the files on your hard disk.
Dill is built on top of pickle, and the warnings apply just as much to pickle as they do to dill.
Pickle uses a stack language to effectively execute arbitrary Python code. An attacker can sneak in instructions to open up a backport to your machine, for example. Don't ever use pickled data from untrusted sources.
The documentation includes an explicit warning:
Warning: The
pickle
module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
Although I don't see a warning, does a similar situation also exist for pickle?
Always, always assume that just because someone doesn't state it's dangerous it is not safe to use something.
That being said, Pickle docs do say the same:
Warning The
pickle
module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
So yes, that security risk exists on pickle, too.
To explain the background: pickle and dill restore the state of python objects. In CPython, the default python implementation, this means restoring PyObjects
structs, which contain a length field. Modification of that, as an example, leads to funky effects and might have arbitrary effects on your python process' memory.
By the way, even assuming that data is not malicious doesn't mean you can un-pickle or un-dill just about anything that comes e.g. from a different python version. So, to me, that question is a bit of theoretical one: If you need portable objects, you will have to implement a rock-solid serialization/deserialization mechanism that transports the data you need transported, and nothing more or less.