Serialize a python function with dependencies

后端 未结 1 538
礼貌的吻别
礼貌的吻别 2020-12-30 09:14

I have tried multiple approaches to pickle a python function with dependencies, following many recommendations on StackOverflow, (such as dill, cloudpickle, etc.) but all se

相关标签:
1条回答
  • 2020-12-30 09:18

    I'm the dill author. I do this exact thing over ssh, but with success. Currently, dill and any of the other serializers pickle modules by reference… so to successfully pass a function defined in a file, you have to ensure that the relevant module is also installed on the other machine. I do not believe there is any object serializer that serializes modules directly (i.e. not by reference).

    Having said that, dill does have some options to serialize object dependencies. For example, for class instances, the default in dill is to not serialize class instances by reference… so the class definition can also be serialized and send with the instance. In dill, you can also (use a very new feature to) serialize file handles by serializing the file, instead of the doing so by reference. But again, if you have the case of a function defined in a module, you are out-of-luck, as modules are serialized by reference pretty darn universally.

    You might be able to use dill to do so, however, just not with pickling the object, but with extracting the source and sending the source code. In pathos.pp and pyina, dill us used to extract the source and the dependencies of any object (including functions), and pass them to another computer/process/etc. However, since this is not an easy thing to do, dill can also use the failover of trying to extract a relevant import and send that instead of the source code.

    You can understand, hopefully, this is a messy messy thing to do (as noted in one of the dependencies of the function I am extracting below). However, what you are asking is successfully done in the pathos package to pass code and dependencies to different machines across ssh-tunneled ports.

    >>> import dill
    >>> 
    >>> print dill.source.importable(dill.source.importable)
    from dill.source import importable
    >>> print dill.source.importable(dill.source.importable, source=True)
    def _closuredsource(func, alias=''):
        """get source code for closured objects; return a dict of 'name'
        and 'code blocks'"""
        #FIXME: this entire function is a messy messy HACK
        #      - pollutes global namespace
        #      - fails if name of freevars are reused
        #      - can unnecessarily duplicate function code
        from dill.detect import freevars
        free_vars = freevars(func)
        func_vars = {}
        # split into 'funcs' and 'non-funcs'
        for name,obj in list(free_vars.items()):
            if not isfunction(obj):
                # get source for 'non-funcs'
                free_vars[name] = getsource(obj, force=True, alias=name)
                continue
            # get source for 'funcs'
    
    #…snip… …snip… …snip… …snip… …snip… 
    
                # get source code of objects referred to by obj in global scope
                from dill.detect import globalvars
                obj = globalvars(obj) #XXX: don't worry about alias?
                obj = list(getsource(_obj,name,force=True) for (name,_obj) in obj.items())
                obj = '\n'.join(obj) if obj else ''
                # combine all referred-to source (global then enclosing)
                if not obj: return src
                if not src: return obj
                return obj + src
            except:
                if tried_import: raise
                tried_source = True
                source = not source
        # should never get here
        return
    

    I imagine something could also be built around the dill.detect.parents method, which provides a list of pointers to all parent object for any given object… and one could reconstruct all of any function's dependencies as objects… but this is not implemented.

    BTW: to establish a ssh tunnel, just do this:

    >>> t = pathos.Tunnel.Tunnel()
    >>> t.connect('login.university.edu')
    39322
    >>> t  
    Tunnel('-q -N -L39322:login.university.edu:45075 login.university.edu')
    

    Then you can work across the local port with ZMQ, or ssh, or whatever. If you want to do so with ssh, pathos also has that built in.

    0 讨论(0)
提交回复
热议问题