Demystifying sharedctypes performance

后端 未结 3 2139
南方客
南方客 2021-02-13 05:16

In python it is possible to share ctypes objects between multiple processes. However I notice that allocating these objects seems to be extremely expensive.

Consider fol

3条回答
  •  温柔的废话
    2021-02-13 05:49

    Not an answer (the accepted answer explains this quite well), but for those looking for how to fix this, here's how: Don't use RawArrays slice assignment operator.

    As noted in the accepted answer, RawArrays slice assignment operator doesn't take advantage of the fact that you're copying between two wrappers around C-style arrays of identical type and size. But RawArray implements the buffer protocol, so you can wrap it in a memoryview to access it in an "even more raw" way (and it will make Foo2 win, because you can only do this after constructing the object, not as part of construction):

    def foo2():
        sh = sct.RawArray(ct.c_int, len(l))
        # l must be another buffer protocol object w/the same C format, which is the case here
        memoryview(sh)[:] = l
        return sh
    

    In tests solving this problem on another question, the time to copy using a memoryview wrapper is less than 1% of the time required to copy with RawArrays normal slice assignment. One trick here is that the sizes of the elements of the output of np.random.randint are np.int, and on a 64 bit system, np.int is 64 bits, so on 64 bit Python, you need another round of copying to coerce it to the right size (or you need to declare the RawArray to be of a type that matches the size of np.int). Even if you do need to make that temporary copy though, it's still much cheaper with a memoryview:

    >>> l = np.random.randint(0, 10, size=100000)
    >>> %time sh = sct.RawArray(ct.c_int, len(l))
    Wall time: 472 µs  # Creation is cheap
    
    >>> %time sh[:] = l
    Wall time: 14.4 ms  # TOO LONG!
    
    # Must convert to numpy array with matching element size when c_int and np.int don't match
    >>> %time memoryview(sh)[:] = np.array(l, dtype=np.int32)
    Wall time: 424 µs
    

    As you can see, even when you need to copy the np.array to resize the elements first, the total time is less than 3% of the time required using RawArray's own slice assignment operator.

    If you avoid the temporary copy by making the size of the RawArray match the source, the cost drops further:

    # Make it 64 bit to match size of np.int on my machine
    >>> %time sh = sct.RawArray(ct.c_int64, len(l))
    Wall time: 522 µs  # Creation still cheap, even at double the size
    
    # No need to convert source array now:
    >>> %time memoryview(sh)[:] = l
    Wall time: 123 µs
    

    which gets us down to 0.85% of the RawArray slice assignment time; at this point, you're basically running at memcpy speeds; the rest of your actual Python code will swamp the miniscule amount of time spent on data copying.

提交回复
热议问题