Fast way to take average of every N rows in a .npy array

后端 未结 3 953
清酒与你
清酒与你 2021-01-05 18:32

I have a very large masked NumPy array (originalArray) with many rows and two columns. I want take the average of every two rows in originalArray

相关标签:
3条回答
  • 2021-01-05 19:11
    import numpy as np
    
    def av(array):
        return  1. * np.sum(array.reshape(1. * array.shape[0] / 2,2, array.shape[1]),axis = 1) / array.shape[1]
    
    a = np.array([[1,1],[2,2],[3,3],[4,4]])
    
    print av(a)
    
    >> [[ 1.5  1.5] [ 3.5  3.5]]
    
    0 讨论(0)
  • 2021-01-05 19:29

    The mean of two values a and b is 0.5*(a+b)
    Therefore you can do it like this:

    newArray = 0.5*(originalArray[0::2] + originalArray[1::2])
    

    It will sum up all two consecutive rows and in the end multiply every element by 0.5.

    Since in the title you are asking for avg over N rows, here is a more general solution:

    def groupedAvg(myArray, N=2):
        result = np.cumsum(myArray, 0)[N-1::N]/float(N)
        result[1:] = result[1:] - result[:-1]
        return result
    

    The general form of the average over n elements is sum([x1,x2,...,xn])/n. The sum of elements m to m+n in vector v is the same as subtracting the m-1th element from the m+nth element of cumsum(v). Unless m is 0, in that case you don't subtract anything (result[0]).
    That is what we take advantage of here. Also since everything is linear, it is not important where we divide by N, so we do it right at the beginning, but that is just a matter of taste.

    If the last group has less than N elements, it will be ignored completely. If you don't want to ignore it, you have to treat the last group specially:

    def avg(myArray, N=2):
        cum = np.cumsum(myArray,0)
        result = cum[N-1::N]/float(N)
        result[1:] = result[1:] - result[:-1]
    
        remainder = myArray.shape[0] % N
        if remainder != 0:
            if remainder < myArray.shape[0]:
                lastAvg = (cum[-1]-cum[-1-remainder])/float(remainder)
            else:
                lastAvg = cum[-1]/float(remainder)
            result = np.vstack([result, lastAvg])
    
        return result
    
    0 讨论(0)
  • 2021-01-05 19:36

    Your problem (average of every two rows with two columns):

    >>> a = np.reshape(np.arange(12),(6,2))
    >>> a
    array([[ 0,  1],
           [ 2,  3],
           [ 4,  5],
           [ 6,  7],
           [ 8,  9],
           [10, 11]])
    >>> a.transpose().reshape(-1,2).mean(1).reshape(2,-1).transpose()
    array([[  1.,   2.],
           [  5.,   6.],
           [  9.,  10.]])
    

    Other dimensions (average of every four rows with three columns):

    >>> a = np.reshape(np.arange(24),(8,3))
    >>> a
    array([[ 0,  1,  2],
           [ 3,  4,  5],
           [ 6,  7,  8],
           [ 9, 10, 11],
           [12, 13, 14],
           [15, 16, 17],
           [18, 19, 20],
           [21, 22, 23]])
    >>> a.transpose().reshape(-1,4).mean(1).reshape(3,-1).transpose()
    array([[  4.5,   5.5,   6.5],
           [ 16.5,  17.5,  18.5]])
    

    General formula for taking the average of r rows for a 2D array a with c columns:

    a.transpose().reshape(-1,r).mean(1).reshape(c,-1).transpose()
    
    0 讨论(0)
提交回复
热议问题