I am trying to parallelize a code using class methods via multiprocessing. The basic structure is the following:
# from multiprocessing import Pool
from path
In multiprocessing function serialization is needed for interprocess communication. Pickle does a poor job for this purpose, install dill via pip instead. Details (with a nice Star Trek example) can be found here: http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization/
pathos
uses dill
, and dill
serializes classes differently than python's pickle
module does. pickle
serializes classes by reference. dill
(by default) serializes classes directly, and only optionally by reference.
>>> import dill
>>>
>>> class Foo(object):
... def __init__(self, x):
... self.x = x
... def bar(self, y):
... return self.x + y * z
... z = 1
...
>>> f = Foo(2)
>>>
>>> dill.dumps(f) # the dill default, explicitly serialize a class
'\x80\x02cdill.dill\n_create_type\nq\x00(cdill.dill\n_load_type\nq\x01U\x08TypeTypeq\x02\x85q\x03Rq\x04U\x03Fooq\x05h\x01U\nObjectTypeq\x06\x85q\x07Rq\x08\x85q\t}q\n(U\r__slotnames__q\x0b]q\x0cU\n__module__q\rU\x08__main__q\x0eU\x03barq\x0fcdill.dill\n_create_function\nq\x10(cdill.dill\n_unmarshal\nq\x11Uyc\x02\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00C\x00\x00\x00s\x0f\x00\x00\x00|\x00\x00j\x00\x00|\x01\x00t\x01\x00\x14\x17S(\x01\x00\x00\x00N(\x02\x00\x00\x00t\x01\x00\x00\x00xt\x01\x00\x00\x00z(\x02\x00\x00\x00t\x04\x00\x00\x00selft\x01\x00\x00\x00y(\x00\x00\x00\x00(\x00\x00\x00\x00s\x07\x00\x00\x00<stdin>t\x03\x00\x00\x00bar\x04\x00\x00\x00s\x02\x00\x00\x00\x00\x01q\x12\x85q\x13Rq\x14c__builtin__\n__main__\nh\x0fNN}q\x15tq\x16Rq\x17U\x01zq\x18K\x01U\x07__doc__q\x19NU\x08__init__q\x1ah\x10(h\x11Uuc\x02\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00C\x00\x00\x00s\r\x00\x00\x00|\x01\x00|\x00\x00_\x00\x00d\x00\x00S(\x01\x00\x00\x00N(\x01\x00\x00\x00t\x01\x00\x00\x00x(\x02\x00\x00\x00t\x04\x00\x00\x00selfR\x00\x00\x00\x00(\x00\x00\x00\x00(\x00\x00\x00\x00s\x07\x00\x00\x00<stdin>t\x08\x00\x00\x00__init__\x02\x00\x00\x00s\x02\x00\x00\x00\x00\x01q\x1b\x85q\x1cRq\x1dc__builtin__\n__main__\nh\x1aNN}q\x1etq\x1fRq utq!Rq")\x81q#}q$U\x01xq%K\x02sb.'
>>> dill.dumps(f, byref=True) # the pickle default, serialize by reference
'\x80\x02c__main__\nFoo\nq\x00)\x81q\x01}q\x02U\x01xq\x03K\x02sb.'
Not serializing by reference is much more flexible. However, in rare circumstances, working with references is better (as it appears to be the case when pickling something built on a SwigPyObject
).
I have been meaning (for ~2 years) to expose the byref
flag to the dump
call inside of pathos
, but have not done so yet. It should be a fairly simple edit to do so. I've just added a ticket to do so: https://github.com/uqfoundation/pathos/issues/58. While I'm at it, it should also be easy to open up replacement of the dump
and load
functions that pathos
uses… that way you could use customized serializers (i.e. extend those that dill
provides, or use some other serializer).