Multiprocessing Share Unserializable Objects Between Processes

后端 未结 3 2095
抹茶落季
抹茶落季 2020-12-03 07:38

There are three questions as possible duplicates (but too specific):

  • How to properly set up multiprocessing proxy objects for objects that already exist
相关标签:
3条回答
  • 2020-12-03 08:37

    Most of the time it's not really desirable to pass the reference of an existing object to another process. Instead you create your class you want to share between processes:

    class MySharedClass:
        # stuff...
    

    Then you make a proxy manager like this:

    import multiprocessing.managers as m
    class MyManager(m.BaseManager):
        pass # Pass is really enough. Nothing needs to be done here.
    

    Then you register your class on that Manager, like this:

    MyManager.register("MySharedClass", MySharedClass)
    

    Then once the manager is instanciated and started, with manager.start() you can create shared instances of your class with manager.MySharedClass. This should work for all needs. The returned proxy works exactly like the original objects, except for some exceptions described in the documentation.

    0 讨论(0)
  • 2020-12-03 08:40

    Before reading this answer, please note that the solution explained in it is terrible. Please note the warning at the end of the answer.

    I found a way to share the state of an object through multiprocessing.Array. So I made this class that transparently shares it's state through all processes:

    import multiprocessing as m
    import pickle
    
    class Store:
        pass
    
    class Shareable:
        def __init__(self, size = 2**10):
            object.__setattr__(self, 'store', m.Array('B', size))
            o = Store() # This object will hold all shared values
            s = pickle.dumps(o)
            store(object.__getattribute__(self, 'store'), s)
    
        def __getattr__(self, name):
            s = load(object.__getattribute__(self, 'store'))
            o = pickle.loads(s)
            return getattr(o, name)
    
        def __setattr__(self, name, value):
            s = load(object.__getattribute__(self, 'store'))
            o = pickle.loads(s)
            setattr(o, name, value)
            s = pickle.dumps(o)
            store(object.__getattribute__(self, 'store'), s)
    
    def store(arr, s):
        for i, ch in enumerate(s):
            arr[i] = ch
    
    def load(arr):
        l = arr[:]
        return bytes(arr)
    

    You can pass instances of this class (and it's subclasses) to any other process and it will synchronize it's state through all processes. This was tested with this code:

    class Foo(Shareable):
        def __init__(self):
            super().__init__()
            self.f = 1
    
        def foo(self):
            self.f += 1
    
    def f(s):
        s.f += 1
    
    if __name__ == '__main__':
        import multiprocessing as m
        import time
        s = Foo()
        print(s.f)
        p = m.Process(target=f, args=(s,))
        p.start()
        time.sleep(1)
        print(s.f)
    

    The "magic" of this class is that it stores all of it attributes in another instance of the class Store. This class isn't very special. It's just some class that can have arbitrary attributes. (A dict would have done as well.)

    However, this class has some really nasty quirks. I found two.

    The first quirk is that you have to specify how much space the Store instance will take at most. This is because multiprocessing.Array has a static size. So the object that can be pickled in it can only be as large as the array.

    The second quirk is that you can't use this class with ProcessPoolExecutors or simple Pools. If you try to do this, you get an error:

    >>> s = Foo()
    >>> with ProcessPoolExecutor(1) as e:
    ...     e.submit(f, args=(s,))
    ... 
    <Future at 0xb70fe20c state=running>
    Traceback (most recent call last):
    <omitted>
    RuntimeError: SynchronizedArray objects should only be shared between processes through inheritance
    

    warning
    You should probably not use this approach, as it uses an uncontrollable amount of memory, is overly complicated compared to using a proxy (see my other answer) and might crash in spectacular ways.

    0 讨论(0)
  • 2020-12-03 08:42

    Just use stackless python. You can serialize almost anything with pickle, including functions. Here I serialize and deserialize a lambda using the pickle module. This is similar to what you are trying to do in your example.

    Here is the download link for Stackless Python http://www.stackless.com/wiki/Download

    Python 2.7.5 Stackless 3.1b3 060516 (default, Sep 23 2013, 20:17:03) 
    [GCC 4.6.3] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> f = 5
    >>> g = lambda : f * f
    >>> g()
    25
    >>> import pickle
    >>> p = pickle.dumps(g)
    >>> m = pickle.loads(p)
    >>> m()
    25
    >>> 
    
    0 讨论(0)
提交回复
热议问题