Does Cython offer any reasonably easy and efficient way to iterate Numpy arrays as if they were flat?

前端 未结 3 538
悲哀的现实
悲哀的现实 2021-01-22 19:43

Let\'s say I want to implement Numpy\'s

x[:] += 1

in Cython. I could write

@cython.boundscheck(False)
@cython.wraparoundcheck(F         


        
3条回答
  •  广开言路
    2021-01-22 20:27

    http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html

    Is a nice tutorial on using nditer. It ends with a cython version. nditer is meant to be the all purpose array(s) iterator in numpy c level code.

    There are also good array examples on the Cython memoryview page:

    http://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html

    http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html

    The data buffer of an ndarray is a flat buffer. So regardless of the array's shape and strides, you can iterate over the whole buffer in a flat c pointer fashion. But things like nditer and memoryview take care of the element size details. So in c level coding it is actually easier to step through all the elements in a flat fashion than it is to step by rows - going by rows has to take into account the row stride.

    This runs in Python, and I think it will translate nicely into cython (I don't have that setup on my machine at the moment):

    import numpy as np
    
    def add1(x):
       it = np.nditer([x], op_flags=[['readwrite']])
       for i in it:
           i[...] += 1
       return it.operands[0]
    
    x = np.arange(10).reshape(2,5)
    y = add1(x)
    print(x)
    print(y)
    

    https://github.com/hpaulj/numpy-einsum/blob/master/sop.pyx is a sum-of-products script that I wrote a while back to simulate einsum.

    The core of its w = sop(x,y) calculation is:

    it = np.nditer(ops, flags, op_flags, op_axes=op_axes, order=order)
    it.operands[nop][...] = 0
    it.reset()
    for xarr, yarr, warr in it:
        x = xarr
        y = yarr
        w = warr
        size = x.shape[0]
        for i in range(size):
           w[i] = w[i] + x[i] * y[i]
    return it.operands[nop]
    

    ===================

    copying ideas from the nditer.html document, I got a version of add1 that is only half the speed of the native numpy a+1. The naive nditer (above) isn't much faster in cython than in python. A lot of the speedup may be due to the external loop.

    @cython.boundscheck(False)
    def add11(arg):
       cdef np.ndarray[double] x
       cdef int size
       cdef double value
       it = np.nditer([arg],
            flags=['external_loop','buffered'], 
            op_flags=[['readwrite']])
       for xarr in it:
           x = xarr
           size = x.shape[0]
           for i in range(size):
               x[i] = x[i]+1.0
       return it.operands[0]
    

    I also coded this nditer in python with a print of size, and found that it iterated on your b with 78 blocks of size 8192 - that's a buffer size, not some characteristic of b and its data layout.

    In [15]: a = np.zeros((1000, 1000)) 
    In [16]: b = a[100:-100, 100:-100]
    
    In [17]: timeit add1.add11(b)
    100 loops, best of 3: 4.48 ms per loop
    
    In [18]: timeit b[:] += 1
    100 loops, best of 3: 8.76 ms per loop
    
    In [19]: timeit add1.add1(b)    # for the unbuffered nditer 
    1 loop, best of 3: 3.1 s per loop
    
    In [21]: timeit add1.add11(a)
    100 loops, best of 3: 5.44 ms per loop
    

提交回复
热议问题