Elements Arrangement For Calculating Mean In Python/NumPy

前端未结

关注

 3  1243

离开以前 2020-12-21 08:15

I have 1-d list as follows:

data = [1,5,9,13,
        2,6,10,14,
        3,7,11,15,
        4,8,12,16]

I want to make the following list of

3条回答

时光说笑 (楼主)

2020-12-21 08:33

Listed in this post are some solution suggestions -

def grouped_mean(data,M2,N1,N2):

    # Paramters:
    # M2 = Columns in input data
    # N1, N2 = Blocksize into which data is to be divided and averaged

    # Get grouped mean values; transpose and flatten for final output
    grouped_mean = np.array(data).reshape(-1,N2).sum(1).reshape(-1,N1,M2/N2).sum(1)/(N1*N2)

    # Return transposed and flattened version as output (as per OP) 
    return grouped_mean.T.ravel()

Now, grouped_mean could be calculated with np.einsum instead of np.sum like so -

stage1_sum = np.einsum('ij->i',np.array(data).reshape(-1,N2))
grouped_mean = np.einsum('ijk->ik',stage1_sum.reshape(-1,N1,M2/N2))/(N1*N2)

Or, one can go in with splitting 2D input array to a 4D array as suggested in @Warren Weckesser's solution and then use np.einsum like so -

split_data = np.array(data).reshape(-1, N1, M2/N2, N2)
grouped_mean = np.einsum('ijkl->ik',split_data)/(N1*N2)

Sample run -

In [182]: data = np.array([[1,5,9,13],
     ...:                  [2,6,10,14],
     ...:                  [3,7,11,15],
     ...:                  [4,8,12,16]])

In [183]: grouped_mean(data,4,2,2)
Out[183]: array([  3.5,   5.5,  11.5,  13.5])

Runtime tests

Calculating grouped_mean seems to be the most computationally intensive part of the code. So, here's some runtime tests to calculate it with those three approaches -

In [174]: import numpy as np
     ...: # Setup parameters and input list
     ...: M2 = 4000
     ...: N1 = 2
     ...: N2 = 2
     ...: data = np.random.randint(0,9,(16000000)).tolist()
     ...: 

In [175]: %timeit np.array(data).reshape(-1,N2).sum(1).reshape(-1,N1,M2/N2).sum(1)/(N1*N2)
     ...: %timeit np.einsum('ijk->ik',np.einsum('ij->i',np.array(data).reshape(-1,N2)).reshape(-1,N1,M2/N2))/(N1*N2)
     ...: %timeit np.einsum('ijkl->ik',np.array(data).reshape(-1, N1, M2/N2, N2))/(N1*N2)
     ...: 
1 loops, best of 3: 2.2 s per loop
1 loops, best of 3: 2.12 s per loop
1 loops, best of 3: 2.1 s per loop

0 讨论(0)

查看其它3个回答