Elements Arrangement For Calculating Mean In Python/NumPy

前端 未结 3 1244
离开以前
离开以前 2020-12-21 08:15

I have 1-d list as follows:

data = [1,5,9,13,
        2,6,10,14,
        3,7,11,15,
        4,8,12,16]

I want to make the following list of

相关标签:
3条回答
  • 2020-12-21 08:33

    Listed in this post are some solution suggestions -

    def grouped_mean(data,M2,N1,N2):
    
        # Paramters:
        # M2 = Columns in input data
        # N1, N2 = Blocksize into which data is to be divided and averaged
    
        # Get grouped mean values; transpose and flatten for final output
        grouped_mean = np.array(data).reshape(-1,N2).sum(1).reshape(-1,N1,M2/N2).sum(1)/(N1*N2)
    
        # Return transposed and flattened version as output (as per OP) 
        return grouped_mean.T.ravel()
    

    Now, grouped_mean could be calculated with np.einsum instead of np.sum like so -

    stage1_sum = np.einsum('ij->i',np.array(data).reshape(-1,N2))
    grouped_mean = np.einsum('ijk->ik',stage1_sum.reshape(-1,N1,M2/N2))/(N1*N2)
    

    Or, one can go in with splitting 2D input array to a 4D array as suggested in @Warren Weckesser's solution and then use np.einsum like so -

    split_data = np.array(data).reshape(-1, N1, M2/N2, N2)
    grouped_mean = np.einsum('ijkl->ik',split_data)/(N1*N2)
    

    Sample run -

    In [182]: data = np.array([[1,5,9,13],
         ...:                  [2,6,10,14],
         ...:                  [3,7,11,15],
         ...:                  [4,8,12,16]])
    
    In [183]: grouped_mean(data,4,2,2)
    Out[183]: array([  3.5,   5.5,  11.5,  13.5])
    

    Runtime tests

    Calculating grouped_mean seems to be the most computationally intensive part of the code. So, here's some runtime tests to calculate it with those three approaches -

    In [174]: import numpy as np
         ...: # Setup parameters and input list
         ...: M2 = 4000
         ...: N1 = 2
         ...: N2 = 2
         ...: data = np.random.randint(0,9,(16000000)).tolist()
         ...: 
    
    In [175]: %timeit np.array(data).reshape(-1,N2).sum(1).reshape(-1,N1,M2/N2).sum(1)/(N1*N2)
         ...: %timeit np.einsum('ijk->ik',np.einsum('ij->i',np.array(data).reshape(-1,N2)).reshape(-1,N1,M2/N2))/(N1*N2)
         ...: %timeit np.einsum('ijkl->ik',np.array(data).reshape(-1, N1, M2/N2, N2))/(N1*N2)
         ...: 
    1 loops, best of 3: 2.2 s per loop
    1 loops, best of 3: 2.12 s per loop
    1 loops, best of 3: 2.1 s per loop
    
    0 讨论(0)
  • 2020-12-21 08:41

    Put the data into a 4-d numpy array with shape (2, 2, 2, 2), then take the mean of that array over axes 1 and 3:

    In [25]: data
    Out[25]: [1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15, 4, 8, 12, 16]
    
    In [26]: a = np.array(data).reshape(2, 2, 2, 2)
    
    In [27]: a
    Out[27]: 
    array([[[[ 1,  5],
             [ 9, 13]],
    
            [[ 2,  6],
             [10, 14]]],
    
    
           [[[ 3,  7],
             [11, 15]],
    
            [[ 4,  8],
             [12, 16]]]])
    
    In [28]: a.mean(axis=(1, 3))
    Out[28]: 
    array([[  3.5,  11.5],
           [  5.5,  13.5]])
    

    You can use the ravel() method if you need the final result as a 1-d array:

    In [31]: a.mean(axis=(1, 3)).ravel()
    Out[31]: array([  3.5,  11.5,   5.5,  13.5])
    

    See How can I vectorize the averaging of 2x2 sub-arrays of numpy array? for a similar question.

    0 讨论(0)
  • 2020-12-21 08:47

    Here is one approach

    In [29]: a = np.array(data)
    
    In [30]: a2 = a.reshape(4,4)
    
    In [31]: a3 = np.vstack((a2[:, :2], a2[:, 2:]))
    
    In [32]: a4 = a3.reshape(4,4)
    
    In [33]: np.mean(a4, axis=1)
    Out[33]: array([  3.5,   5.5,  11.5,  13.5])
    
    0 讨论(0)
提交回复
热议问题