How to pickle functions/classes defined in __main__ (python)

前端 未结 3 1285
清酒与你
清酒与你 2021-01-18 14:08

I would like to be able to pickle a function or class from within __main__, with the obvious problem (mentioned in other posts) that the pickled function/class is in the __m

相关标签:
3条回答
  • 2021-01-18 14:37

    If you are trying to pickle something so that you can use it somewhere else, separate from test_script, that's not going to work, because pickle (apparently) just tries to load the function from the module. Here's an example:

    test_script.py

    def my_awesome_function(x, y, z):
        return x + y + z
    

    picklescript.py

    import pickle
    import test_script
    with open("awesome.pickle", "wb") as f:
        pickle.dump(test_script.my_awesome_function, f)
    

    If you run python picklescript.py, then change the filename of test_script, when you try to load the function, it will fail. e.g.

    Running this:

    import pickle
    with open("awesome.pickle", "rb") as f:
        pickle.load(f)
    

    Will give you the following traceback:

    Traceback (most recent call last):
      File "load_pickle.py", line 3, in <module>
        pickle.load(f)
      File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/pickle.py", line 1378, in load
        return Unpickler(file).load()
      File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/pickle.py", line 858, in load
        dispatch[key](self)
      File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/pickle.py", line 1090, in load_global
        klass = self.find_class(module, name)
      File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/pickle.py", line 1124, in find_class
        __import__(module)
    ImportError: No module named test_script
    
    0 讨论(0)
  • 2021-01-18 14:38

    Pickle seems to look at the main scope for definitions of classes and functions. From inside the module you're unpickling from, try this:

    import myscript
    import __main__
    __main__.myclass = myscript.myclass
    #unpickle anywhere after this
    
    0 讨论(0)
  • 2021-01-18 14:40

    You can get a better handle on global objects by importing __main__, and using the methods available in that module. This is what dill does in order to serialize almost anything in python. Basically, when dill serializes an interactively defined function, it uses some name mangling on __main__ on both the serialization and deserialization side that makes __main__ a valid module.

    >>> import dill
    >>> 
    >>> def bar(x):
    ...   return foo(x) + x
    ... 
    >>> def foo(x):
    ...   return x**2
    ... 
    >>> bar(3)
    12
    >>> 
    >>> _bar = dill.loads(dill.dumps(bar))
    >>> _bar(3)
    12
    

    Actually, dill registers it's types into the pickle registry, so if you have some black box code that uses pickle and you can't really edit it, then just importing dill can magically make it work without monkeypatching the 3rd party code.

    Or, if you want the whole interpreter session sent over as an "python image", dill can do that too.

    >>> # continuing from above
    >>> dill.dump_session('foobar.pkl')
    >>>
    >>> ^D
    dude@sakurai>$ python
    Python 2.7.5 (default, Sep 30 2013, 20:15:49) 
    [GCC 4.2.1 (Apple Inc. build 5566)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import dill
    >>> dill.load_session('foobar.pkl')
    >>> _bar(3)
    12
    

    You can easily send the image across ssh to another computer, and start where you left off there as long as there's version compatibility of pickle and the usual caveats about python changing and things being installed.

    I actually use dill to serialize objects and send them across parallel resources with parallel python, multiprocessing, and mpi4py. I roll these up conveniently into the pathos package (and pyina for MPI), which provides a uniform map interface for different parallel batch processing backends.

    >>> # continued from above
    >>> from pathos.multiprocessing import ProcessingPool as Pool
    >>> Pool(4).map(foo, range(10))
    [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
    >>>
    >>> from pyina.launchers import MpiPool
    >>> MpiPool(4).map(foo, range(10))
    [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
    

    There are also non-blocking and iterative maps as well as non-parallel pipe connections. I also have a pathos module for pp, however, it is somewhat unstable for functions defined in __main__. I'm working on improving that. If you like, fork the code on github and help make the pp better for functions defined in __main__. The reason pp doesn't pickle well is that pp does it's serialization tricks through using temporary file objects and reading the interpreter session's history... so it doesn't serialize objects in the same way that multiprocessing or mpi4py do. I have a dill module dill.source that seamlessly does the same type of pickling that pp uses, but it's rather new.

    0 讨论(0)
提交回复
热议问题