Python - vectorizing a sliding window

前端 未结 4 1883
既然无缘
既然无缘 2021-01-05 17:16

I\'m trying to vectorize a sliding window operation. For the 1-d case a helpful example could go along the lines of:

x= vstack((np.array([range(10)]),np.arr         


        
相关标签:
4条回答
  • 2021-01-05 18:07

    There just so happens to be a function in the Scipy standard library that computes the mean over sliding windows extremely fast. It's called uniform_filter. You can use it to implement your mean-of-neighbourhood function as follows:

    from scipy.ndimage.filters import uniform_filter
    def neighbourhood_average(arr, win=3):
        sums = uniform_filter(arr, win, mode='constant') * (win*win)
        return ((sums - arr) / (win*win - 1))
    

    This returns an array X where X[i,j] is the average of all neighbours of i,j in arr excluding i,j itself. Note that the first and last column and the first and last row are subject to boundary conditions, and so may be invalid for your application (you can use mode= to control the boundary rule if necessary).

    Because uniform_filter uses a highly efficient linear-time algorithm implemented in straight C (linear only in the size of arr), it should easily outperform any other solutions, especially when win is large.

    0 讨论(0)
  • 2021-01-05 18:08

    If I understand the problem correctly you would like to take the mean of all numbers 1 step around the index, neglecting the index.

    I have patched your function to work, I believe you were going for something like this:

    def original(matriz):
    
        vector2 = np.ndarray.flatten(matriz)
    
        nrows, ncols= matriz.shape
        vector = np.zeros(nrows*ncols,dtype='float64')
    
        # Interior pixels
        for i in range(vector.shape[0]):
            if ( (i % ncols) != 0 and (i+1) % ncols != 0 and i>ncols and i<ncols*(nrows-1)):
    
                vector[i] = np.mean(np.array([vector2[i-ncols-1],vector2[i-ncols],\
                            vector2[i-ncols+1],vector2[i-1],vector2[i+1],\
                            vector2[i+ncols-1],vector2[i+ncols],vector2[i+ncols+1]]))
    

    I rewrote this using using slicing and views:

    def mean_around(arr):
        arr=arr.astype(np.float64)
    
        out= np.copy(arr[:-2,:-2])  #Top left corner
        out+= arr[:-2,2:]           #Top right corner
        out+= arr[:-2,1:-1]         #Top center
        out+= arr[2:,:-2]           #etc
        out+= arr[2:,2:]
        out+= arr[2:,1:-1]
        out+= arr[1:-1,2:]
        out+= arr[1:-1,:-2]
    
        out/=8.0    #Divide by # of elements to obtain mean
    
        cout=np.empty_like(arr)  #Create output array
        cout[1:-1,1:-1]=out      #Fill with out values
        cout[0,:]=0;cout[-1,:]=0;cout[:,0]=0;cout[:,-1]=0 #Set edges equal to zero
    
        return  cout
    

    Using np.empty_like and then filling the edges seemed slightly faster then np.zeros_like. First lets double check they give the same thing using your matriz array.

    print np.allclose(mean_around(matriz),original(matriz))
    True
    
    print mean_around(matriz)
    [[ 0.     0.     0.     0.     0.   ]
     [ 0.     2.5    2.75   3.125  0.   ]
     [ 0.     3.25   2.75   2.375  0.   ]
     [ 0.     1.875  2.     2.     0.   ]
     [ 0.     2.25   2.25   1.75   0.   ]
     [ 0.     0.     0.     0.     0.   ]]
    

    Some timings:

    a=np.random.rand(500,500)
    
    print np.allclose(original(a),mean_around(a))
    True
    
    %timeit mean_around(a)
    100 loops, best of 3: 4.4 ms per loop
    
    %timeit original(a)
    1 loops, best of 3: 6.6 s per loop
    

    Roughly ~1500x speedup.

    Looks like a good place to use numba:

    def mean_numba(arr):
        out=np.zeros_like(arr)
        col,rows=arr.shape
    
        for x in xrange(1,col-1):
            for y in xrange(1,rows-1):
                out[x,y]=(arr[x-1,y+1]+arr[x-1,y]+arr[x-1,y-1]+arr[x,y+1]+\
                          arr[x,y-1]+arr[x+1,y+1]+arr[x+1,y]+arr[x+1,y-1])/8.
        return out
    
    nmean= autojit(mean_numba)
    

    Now lets compare against all presented methods.

    a=np.random.rand(5000,5000)
    
    %timeit mean_around(a)
    1 loops, best of 3: 729 ms per loop
    
    %timeit nmean(a)
    10 loops, best of 3: 169 ms per loop
    
    #CT Zhu's answer
    %timeit it_mean(a)
    1 loops, best of 3: 36.7 s per loop
    
    #Ali_m's answer
    %timeit fast_local_mean(a,(3,3))
    1 loops, best of 3: 4.7 s per loop
    
    #lmjohns3's answer
    %timeit scipy_conv(a)
    1 loops, best of 3: 3.72 s per loop
    

    A 4x speed with numba up is pretty nominal indicating that the numpy code is about as good as its going to get. I pulled the other codes as presented, although I did have to change @CTZhu's answer to include different array sizes.

    0 讨论(0)
  • 2021-01-05 18:13

    It sounds like you're trying to compute a 2D convolution. If you are able to use scipy, I would suggest trying scipy.signal.convolve2d:

    matriz = np.random.randn(10, 10)
    
    # to average a 3x3 neighborhood
    kernel = np.ones((3, 3), float)
    
    # to compute the mean, divide by size of neighborhood
    kernel /= kernel.sum()
    
    average = scipy.signal.convolve2d(matriz, kernel)
    

    The reason this computes the mean of all 3x3 neighborhoods can be seen if you "unroll" convolve2d into its constituent loops. Effectively (and ignoring what happens at the edges of the source and kernel arrays), it is computing :

    X, Y = kernel.shape
    for i in range(matriz.shape[0]):
        for j in range(matriz.shape[1]):
            for ii in range(X):
                for jj in range(Y):
                    average[i, j] += kernel[ii, jj] * matriz[i+ii, j+jj]
    

    So if every value in your kernel is 1/(1+1+1+1+1+1+1+1+1) == 1/9, you can rewrite the code above as :

    for i in range(matriz.shape[0]):
        for j in range(matriz.shape[1]):
            average[i, j] = 1./9 * matriz[i:i+X, j:j+Y].sum()
    

    Which is exactly the same as computing the average of the values in matriz, over a 3x3 area, starting at i, j.

    One advantage of doing things this way is that you can easily change the weights associated with your neighborhood by setting values in your kernel appropriately. So, for example, if you wanted to give the center value in each neighborhood twice as much weight as the others, you could build your kernel like this :

    kernel = np.ones((3, 3), float)
    kernel[1, 1] = 2.
    kernel /= kernel.sum()
    

    and the convolution code would remain the same, but the computation would yield a different type of average (a "center-weighted" one). There are a lot of possibilities here ; hopefully this provides a nice abstraction for the task you're doing.

    0 讨论(0)
  • 2021-01-05 18:16

    The problem lies in x[1,x[0,:]+1], the index for the 2nd axis: x[0,:]+1 is [1 2 3 4 5 6 7 8 9 10], in which index 10 is larger than the dimension of x.

    In the case of x[1,x[0,:]-1], the index of the 2nd axis is [-1 0 1 2 3 4 5 6 7 8 9], you end up getting [9 0 1 2 3 4 5 6 7 8], as 9 is the last element and has an index of -1. The index of the second element from the end is -2 and so on.

    With np.where((x[0,:]<5)&(x[0,:]>0),x[1,x[0,:]-1],x[1,:]) and x[0,:]=[0 1 2 3 4 5 6 7 8 9], what essentially is going on is that the first cell is taken form x[1,:] because x[0,0] is 0 and x[0,:]<5)&(x[0,:]>0 is False. The next four elements are taken from x[1,x[0,:]-1]. The rest are from x[1,:]. Finally the result is [0 0 1 2 3 4 5 6 7 8]

    It may appear to be OK for sliding-window of just 1 cell, but it's gonna surprise you with:

    >>> np.where((x[0,:]<5)&(x[0,:]>0),x[1,x[0,:]-2],x[1,:])
    array([0, 9, 0, 1, 2, 5, 6, 7, 8, 9])
    

    When you try to move it by a windows of two cells.

    For this specific problem, if we want to keep every thing in one line, this, will do:

    >>> for i in [1, 2, 3, 4, 5, 6]:
        print hstack((np.where(x[1,x[0,:]-i]<x[0, -i], x[1,x[0,:]-i], 0)[:5], x[0,5:]))
    
    [0 0 1 2 3 5 6 7 8 9]
    [0 0 0 1 2 5 6 7 8 9]
    [0 0 0 0 1 5 6 7 8 9]
    [0 0 0 0 0 5 6 7 8 9]
    [0 0 0 0 0 5 6 7 8 9]
    [0 0 0 0 0 5 6 7 8 9]
    

    Edit: Now I understand your original question better, basically you want to take a 2D array and calculate N*N cell average around each cell. That is quite common. First you probably want to limit N to odd numbers, otherwise such thing as 2*2 average around a cell is difficult to define. Suppose we want 3*3 average:

    #In this example, the shape is (10,10)
    >>> a1=\
    array([[3, 7, 0, 9, 0, 8, 1, 4, 3, 3],
       [5, 6, 5, 2, 9, 2, 3, 5, 2, 9],
       [0, 9, 8, 5, 3, 1, 8, 1, 9, 4],
       [7, 4, 0, 0, 9, 3, 3, 3, 5, 4],
       [3, 1, 2, 4, 8, 8, 2, 1, 9, 6],
       [0, 0, 3, 9, 3, 0, 9, 1, 3, 3],
       [1, 2, 7, 4, 6, 6, 2, 6, 2, 1],
       [3, 9, 8, 5, 0, 3, 1, 4, 0, 5],
       [0, 3, 1, 4, 9, 9, 7, 5, 4, 5],
       [4, 3, 8, 7, 8, 6, 8, 1, 1, 8]])
    #move your original array 'a1' around, use range(-2,2) for 5*5 average and so on
    >>> movea1=[a1[np.clip(np.arange(10)+i, 0, 9)][:,np.clip(np.arange(10)+j, 0, 9)] for i, j in itertools.product(*[range(-1,2),]*2)]
    #then just take the average
    >>> averagea1=np.mean(np.array(movea1), axis=0)
    #trim the result array, because the cells among the edges do not have 3*3 average
    >>> averagea1[1:10-1, 1:10-1]
    array([[ 4.77777778,  5.66666667,  4.55555556,  4.33333333,  3.88888889,
         3.66666667,  4.        ,  4.44444444],
       [ 4.88888889,  4.33333333,  4.55555556,  3.77777778,  4.55555556,
         3.22222222,  4.33333333,  4.66666667],
       [ 3.77777778,  3.66666667,  4.33333333,  4.55555556,  5.        ,
         3.33333333,  4.55555556,  4.66666667],
       [ 2.22222222,  2.55555556,  4.22222222,  4.88888889,  5.        ,
         3.33333333,  4.        ,  3.88888889],
       [ 2.11111111,  3.55555556,  5.11111111,  5.33333333,  4.88888889,
         3.88888889,  3.88888889,  3.55555556],
       [ 3.66666667,  5.22222222,  5.        ,  4.        ,  3.33333333,
         3.55555556,  3.11111111,  2.77777778],
       [ 3.77777778,  4.77777778,  4.88888889,  5.11111111,  4.77777778,
         4.77777778,  3.44444444,  3.55555556],
       [ 4.33333333,  5.33333333,  5.55555556,  5.66666667,  5.66666667,
         4.88888889,  3.44444444,  3.66666667]])
    

    I think you don't need to flatten you 2D-array, that causes confusion. Also, if you want to handle the edge elements differently other than just trim them away, consider making masked arrays using np.ma in 'Move your original array around' step.

    0 讨论(0)
提交回复
热议问题