How to np.roll() faster?

前端 未结 1 789
囚心锁ツ
囚心锁ツ 2020-12-21 22:26

I\'m using np.roll() to do nearest-neighbor-like averaging, but I have a feeling there are faster ways. Here is a simplified example, but imagine 3 dimensions and more compl

相关标签:
1条回答
  • 2020-12-21 22:51

    np.roll has to create a copy of the array each time, which is why it is (comparatively) slow. Convolution with something like scipy.ndimage.filters.convolve() will be a bit faster, but it may still create copies (depending on the implementation).

    In this particular case, we can avoid copying altogether by using numpy views and padding the original array in the beginning.

    import numpy as np
    
    
    def no_copy_roll(nx, ny):
        phi_padded = np.zeros((ny+2, nx+2))
        # these are views into different sub-arrays of phi_padded
        # if two sub-array overlap, they share memory
        phi_north = phi_padded[:-2, 1:-1]
        phi_east = phi_padded[1:-1, 2:]
        phi_south = phi_padded[2:, 1:-1]
        phi_west = phi_padded[1:-1, :-2]
        phi = phi_padded[1:-1, 1:-1]
    
        do_me = np.zeros_like(phi, dtype='bool')
        do_me[1:-1, 1:-1] = True
    
        x0, y0, r0 = 40, 65, 12
        x = np.arange(nx, dtype='float')[None, :]
        y = np.arange(ny, dtype='float')[:, None]
        rsq = (x-x0)**2 + (y-y0)**2
        circle = rsq <= r0**2
        phi[circle] = 1.0
        do_me[circle] = False
    
        n, nper = 100, 100
        phi_hold = np.zeros((n+1, ny, nx))
        phi_hold[0] = phi
        for i in range(n):
            for j in range(nper):
                phi2 = 0.25*(phi_south +
                             phi_north +
                             phi_east +
                             phi_west)
    
                phi[do_me] = phi2[do_me]
    
            phi_hold[i+1] = phi
    
        return phi_hold
    

    This will cut about 35% off the time for a simple benchmark like

    from original import original_roll
    from mwe import no_copy_roll
    import numpy as np
    
    nx, ny = (301, 301)
    arr1 = original_roll(nx, ny)
    arr2 = no_copy_roll(nx, ny)
    
    assert np.allclose(arr1, arr2)
    

    here is my profiling result

    37.685 <module>  timing.py:1
    ├─ 22.413 original_roll  original.py:4
    │  ├─ 15.056 [self]
    │  └─ 7.357 roll  <__array_function__ internals>:2
    │     └─ 7.243 roll  numpy\core\numeric.py:1110
    │           [10 frames hidden]  numpy
    ├─ 14.709 no_copy_roll  mwe.py:4
    └─ 0.393 allclose  <__array_function__ internals>:2
       └─ 0.393 allclose  numpy\core\numeric.py:2091
             [2 frames hidden]  numpy
                0.391 isclose  <__array_function__ internals>:2
                └─ 0.387 isclose  numpy\core\numeric.py:2167
                      [4 frames hidden]  numpy
    

    For more elaborate stencils this approach still works, but it can get a bit unwieldy. In this case you can take a look at skimage.util.view_as_windows, which uses a variation of this trick (numpy stride tricks) to return an array that gives you cheap access to a window around each element. You will, however, have to do your own padding, and will need to take care to not create copies of the resulting array, which can get expensive fast.

    0 讨论(0)
提交回复
热议问题