Does Cython offer any reasonably easy and efficient way to iterate Numpy arrays as if they were flat?

前端 未结 3 539
悲哀的现实
悲哀的现实 2021-01-22 19:43

Let\'s say I want to implement Numpy\'s

x[:] += 1

in Cython. I could write

@cython.boundscheck(False)
@cython.wraparoundcheck(F         


        
相关标签:
3条回答
  • 2021-01-22 20:16

    With numpy.ravel

    numpy.ravel(a, order='C')

    Return a contiguous flattened array.

    A 1-D array, containing the elements of the input, is returned. A copy is made only if needed.

    @cython.boundscheck(False)
    @cython.wraparound(False)
    def add1_ravel(np.ndarray xs):
        cdef unsigned long i
        cdef double[::1] aview
    
        narray = xs.ravel()
        aview = narray
    
        for i in range(aview.shape[0]):
            aview[i] += 1
    
        # return xs if the memory is shared
        if not narray.flags['OWNDATA'] or np.may_share_memory(xs, narray):
            return xs
    
        # otherwise new array reshaped
        shape = tuple(xs.shape[i] for i in range(xs.ndim))
        return narray.reshape(shape)
    
    0 讨论(0)
  • 2021-01-22 20:27

    http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html

    Is a nice tutorial on using nditer. It ends with a cython version. nditer is meant to be the all purpose array(s) iterator in numpy c level code.

    There are also good array examples on the Cython memoryview page:

    http://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html

    http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html

    The data buffer of an ndarray is a flat buffer. So regardless of the array's shape and strides, you can iterate over the whole buffer in a flat c pointer fashion. But things like nditer and memoryview take care of the element size details. So in c level coding it is actually easier to step through all the elements in a flat fashion than it is to step by rows - going by rows has to take into account the row stride.

    This runs in Python, and I think it will translate nicely into cython (I don't have that setup on my machine at the moment):

    import numpy as np
    
    def add1(x):
       it = np.nditer([x], op_flags=[['readwrite']])
       for i in it:
           i[...] += 1
       return it.operands[0]
    
    x = np.arange(10).reshape(2,5)
    y = add1(x)
    print(x)
    print(y)
    

    https://github.com/hpaulj/numpy-einsum/blob/master/sop.pyx is a sum-of-products script that I wrote a while back to simulate einsum.

    The core of its w = sop(x,y) calculation is:

    it = np.nditer(ops, flags, op_flags, op_axes=op_axes, order=order)
    it.operands[nop][...] = 0
    it.reset()
    for xarr, yarr, warr in it:
        x = xarr
        y = yarr
        w = warr
        size = x.shape[0]
        for i in range(size):
           w[i] = w[i] + x[i] * y[i]
    return it.operands[nop]
    

    ===================

    copying ideas from the nditer.html document, I got a version of add1 that is only half the speed of the native numpy a+1. The naive nditer (above) isn't much faster in cython than in python. A lot of the speedup may be due to the external loop.

    @cython.boundscheck(False)
    def add11(arg):
       cdef np.ndarray[double] x
       cdef int size
       cdef double value
       it = np.nditer([arg],
            flags=['external_loop','buffered'], 
            op_flags=[['readwrite']])
       for xarr in it:
           x = xarr
           size = x.shape[0]
           for i in range(size):
               x[i] = x[i]+1.0
       return it.operands[0]
    

    I also coded this nditer in python with a print of size, and found that it iterated on your b with 78 blocks of size 8192 - that's a buffer size, not some characteristic of b and its data layout.

    In [15]: a = np.zeros((1000, 1000)) 
    In [16]: b = a[100:-100, 100:-100]
    
    In [17]: timeit add1.add11(b)
    100 loops, best of 3: 4.48 ms per loop
    
    In [18]: timeit b[:] += 1
    100 loops, best of 3: 8.76 ms per loop
    
    In [19]: timeit add1.add1(b)    # for the unbuffered nditer 
    1 loop, best of 3: 3.1 s per loop
    
    In [21]: timeit add1.add11(a)
    100 loops, best of 3: 5.44 ms per loop
    
    0 讨论(0)
  • 2021-01-22 20:39

    You could try using the flat attribute of the ndarray, which provides an iterator over the flattened array object. It always iterates in C-major ordering, with the last index varying the fastest. Something like:

    for i in range(x.size):
        x.flat[i] += 1
    
    0 讨论(0)
提交回复
热议问题