问题
I want to share a numpy array across multiple processes. The processes only read the data, so I want to avoid making copies. I know how to do it if I can start with a multiprocessing.sharedctypes.RawArray
and then create a numpy array using numpy.frombuffer
. But what if I am initially given a numpy array? Is there a way to initialize a RawArray with the numpy array's data without copying the data? Or is there another way to share the data across the processes without copying it?
回答1:
I also have some of your requirements: a) given a large numpy array, b) need to share it among a bunch of processes c) read-only etc. And, for this I have been using something along the lines of:
mynparray = #initialize a large array from a file
shrarr_base_ptr = RawArray(ctypes.c_double, len*rows*cols)
shrarr_ptr = np.frombuffer(shrarr_base_ptr)
shrarr_ptr = mynparray
where in my case, mynparray is 3-D. As for the actual sharing, I used the following style and it works so far.
inq1 = Queue()
inq2 = Queue()
outq = Queue()
p1 = Process(target = myfunc1, args=(inq1, outq,))
p1.start()
inq1.put((shrarr_ptr, ))
p2 = Process(target = myfunc2, args=(inq2, outq,))
p2.start()
inq2.put((shrarr_ptr,))
inq1.close()
inq2.close()
inq1.join_thread()
inq2.join_thread()
....
回答2:
To my knowledge it is not possible to declare memory as shared after it was assigned to a specific process. Similar discussions can be found here and here (more suitable).
Let me quickly sketch the workaround you mentioned (starting with a RawArray
and get a numpy.ndarray
refference to it).
import numpy as np
from multiprocessing.sharedctypes import RawArray
# option 1
raw_arr = RawArray(ctypes.c_int, 12)
# option 2 (set is up, similar to some existing np.ndarray np_arr2)
raw_arr = RawArray(
np.ctypeslib.as_ctypes_type(np_arr2.dtype), len(np_arr2)
)
np_arr = np.frombuffer(raw_arr, dtype=np.dtype(raw_arr))
# np_arr: numpy array with shared memory, can be processed by multiprocessing
If you have to start with a numpy.ndarray
, you have no other choice as to copy the data
import numpy as np
from multiprocessing.sharedctypes import RawArray
np_arr = np.zeros(shape=(3, 4), dtype=np.ubyte)
# option 1
tmp = np.ctypeslib.as_ctypes(np_arr)
raw_arr = RawArray(tmp._type_, tmp)
# option 2
raw_arr = RawArray(np.ctypeslib.as_ctypes_type(np_arr.dtype), np_arr.flatten())
print(raw_arr[:])
回答3:
I'm not sure if this copies the data internally, but you could pass the flat array:
a = numpy.random.randint(1,10,(4,4))
>>> a
array([[5, 6, 7, 7],
[7, 9, 2, 8],
[3, 4, 6, 4],
[3, 1, 2, 2]])
b = RawArray(ctypes.c_long, a.flat)
>>> b[:]
[5, 6, 7, 7, 7, 9, 2, 8, 3, 4, 6, 4, 3, 1, 2, 2]
来源:https://stackoverflow.com/questions/26302456/rawarray-from-numpy-array