In python it is possible to share ctypes objects between multiple processes. However I notice that allocating these objects seems to be extremely expensive.
Consider fol
Not an answer (the accepted answer explains this quite well), but for those looking for how to fix this, here's how: Don't use RawArray
s slice assignment operator.
As noted in the accepted answer, RawArray
s slice assignment operator doesn't take advantage of the fact that you're copying between two wrappers around C-style arrays of identical type and size. But RawArray
implements the buffer protocol, so you can wrap it in a memoryview to access it in an "even more raw" way (and it will make Foo2
win, because you can only do this after constructing the object, not as part of construction):
def foo2():
sh = sct.RawArray(ct.c_int, len(l))
# l must be another buffer protocol object w/the same C format, which is the case here
memoryview(sh)[:] = l
return sh
In tests solving this problem on another question, the time to copy using a memoryview
wrapper is less than 1% of the time required to copy with RawArray
s normal slice assignment.
One trick here is that the sizes of the elements of the output of np.random.randint
are np.int
, and on a 64 bit system, np.int
is 64 bits, so on 64 bit Python, you need another round of copying to coerce it to the right size (or you need to declare the RawArray
to be of a type that matches the size of np.int
). Even if you do need to make that temporary copy though, it's still much cheaper with a memoryview
:
>>> l = np.random.randint(0, 10, size=100000)
>>> %time sh = sct.RawArray(ct.c_int, len(l))
Wall time: 472 µs # Creation is cheap
>>> %time sh[:] = l
Wall time: 14.4 ms # TOO LONG!
# Must convert to numpy array with matching element size when c_int and np.int don't match
>>> %time memoryview(sh)[:] = np.array(l, dtype=np.int32)
Wall time: 424 µs
As you can see, even when you need to copy the np.array
to resize the elements first, the total time is less than 3% of the time required using RawArray
's own slice assignment operator.
If you avoid the temporary copy by making the size of the RawArray
match the source, the cost drops further:
# Make it 64 bit to match size of np.int on my machine
>>> %time sh = sct.RawArray(ct.c_int64, len(l))
Wall time: 522 µs # Creation still cheap, even at double the size
# No need to convert source array now:
>>> %time memoryview(sh)[:] = l
Wall time: 123 µs
which gets us down to 0.85% of the RawArray
slice assignment time; at this point, you're basically running at memcpy
speeds; the rest of your actual Python code will swamp the miniscule amount of time spent on data copying.