RawArray from numpy array?

问题

I want to share a numpy array across multiple processes. The processes only read the data, so I want to avoid making copies. I know how to do it if I can start with a multiprocessing.sharedctypes.RawArray and then create a numpy array using numpy.frombuffer. But what if I am initially given a numpy array? Is there a way to initialize a RawArray with the numpy array's data without copying the data? Or is there another way to share the data across the processes without copying it?

回答1:

I also have some of your requirements: a) given a large numpy array, b) need to share it among a bunch of processes c) read-only etc. And, for this I have been using something along the lines of:

mynparray = #initialize a large array from a file
shrarr_base_ptr = RawArray(ctypes.c_double, len*rows*cols)
shrarr_ptr = np.frombuffer(shrarr_base_ptr)
shrarr_ptr = mynparray

where in my case, mynparray is 3-D. As for the actual sharing, I used the following style and it works so far.

    inq1 = Queue()
    inq2 = Queue()  
    outq = Queue()
    p1 = Process(target = myfunc1, args=(inq1, outq,))
    p1.start()
    inq1.put((shrarr_ptr, ))
    p2 = Process(target = myfunc2, args=(inq2, outq,))
    p2.start()
    inq2.put((shrarr_ptr,))
    inq1.close()
    inq2.close()
    inq1.join_thread()
    inq2.join_thread()
    ....

回答2:

To my knowledge it is not possible to declare memory as shared after it was assigned to a specific process. Similar discussions can be found here and here (more suitable).

Let me quickly sketch the workaround you mentioned (starting with a RawArray and get a numpy.ndarray refference to it).

import numpy as np
from multiprocessing.sharedctypes import RawArray
# option 1
raw_arr = RawArray(ctypes.c_int, 12)
# option 2 (set is up, similar to some existing np.ndarray np_arr2)
raw_arr = RawArray(
        np.ctypeslib.as_ctypes_type(np_arr2.dtype), len(np_arr2)
        )
np_arr = np.frombuffer(raw_arr, dtype=np.dtype(raw_arr))
# np_arr: numpy array with shared memory, can be processed by multiprocessing

If you have to start with a numpy.ndarray, you have no other choice as to copy the data

import numpy as np
from multiprocessing.sharedctypes import RawArray

np_arr = np.zeros(shape=(3, 4), dtype=np.ubyte)
# option 1
tmp = np.ctypeslib.as_ctypes(np_arr)
raw_arr = RawArray(tmp._type_, tmp)
# option 2
raw_arr = RawArray(np.ctypeslib.as_ctypes_type(np_arr.dtype), np_arr.flatten())

print(raw_arr[:])

回答3:

I'm not sure if this copies the data internally, but you could pass the flat array:

a = numpy.random.randint(1,10,(4,4))
>>> a
array([[5, 6, 7, 7],
       [7, 9, 2, 8],
       [3, 4, 6, 4],
       [3, 1, 2, 2]])

b = RawArray(ctypes.c_long, a.flat)
>>> b[:]
[5, 6, 7, 7, 7, 9, 2, 8, 3, 4, 6, 4, 3, 1, 2, 2]

来源：https://stackoverflow.com/questions/26302456/rawarray-from-numpy-array

标签

python

arrays

numpy

multiprocessing