I have 1-d list as follows:
data = [1,5,9,13,
2,6,10,14,
3,7,11,15,
4,8,12,16]
I want to make the following list of
Listed in this post are some solution suggestions -
def grouped_mean(data,M2,N1,N2):
# Paramters:
# M2 = Columns in input data
# N1, N2 = Blocksize into which data is to be divided and averaged
# Get grouped mean values; transpose and flatten for final output
grouped_mean = np.array(data).reshape(-1,N2).sum(1).reshape(-1,N1,M2/N2).sum(1)/(N1*N2)
# Return transposed and flattened version as output (as per OP)
return grouped_mean.T.ravel()
Now, grouped_mean
could be calculated with np.einsum instead of np.sum
like so -
stage1_sum = np.einsum('ij->i',np.array(data).reshape(-1,N2))
grouped_mean = np.einsum('ijk->ik',stage1_sum.reshape(-1,N1,M2/N2))/(N1*N2)
Or, one can go in with splitting 2D input array to a 4D array as suggested in @Warren Weckesser's solution and then use np.einsum
like so -
split_data = np.array(data).reshape(-1, N1, M2/N2, N2)
grouped_mean = np.einsum('ijkl->ik',split_data)/(N1*N2)
Sample run -
In [182]: data = np.array([[1,5,9,13],
...: [2,6,10,14],
...: [3,7,11,15],
...: [4,8,12,16]])
In [183]: grouped_mean(data,4,2,2)
Out[183]: array([ 3.5, 5.5, 11.5, 13.5])
Runtime tests
Calculating grouped_mean
seems to be the most computationally intensive part of the code. So, here's some runtime tests to calculate it with those three approaches -
In [174]: import numpy as np
...: # Setup parameters and input list
...: M2 = 4000
...: N1 = 2
...: N2 = 2
...: data = np.random.randint(0,9,(16000000)).tolist()
...:
In [175]: %timeit np.array(data).reshape(-1,N2).sum(1).reshape(-1,N1,M2/N2).sum(1)/(N1*N2)
...: %timeit np.einsum('ijk->ik',np.einsum('ij->i',np.array(data).reshape(-1,N2)).reshape(-1,N1,M2/N2))/(N1*N2)
...: %timeit np.einsum('ijkl->ik',np.array(data).reshape(-1, N1, M2/N2, N2))/(N1*N2)
...:
1 loops, best of 3: 2.2 s per loop
1 loops, best of 3: 2.12 s per loop
1 loops, best of 3: 2.1 s per loop
Put the data into a 4-d numpy array with shape (2, 2, 2, 2), then take the mean of that array over axes 1 and 3:
In [25]: data
Out[25]: [1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15, 4, 8, 12, 16]
In [26]: a = np.array(data).reshape(2, 2, 2, 2)
In [27]: a
Out[27]:
array([[[[ 1, 5],
[ 9, 13]],
[[ 2, 6],
[10, 14]]],
[[[ 3, 7],
[11, 15]],
[[ 4, 8],
[12, 16]]]])
In [28]: a.mean(axis=(1, 3))
Out[28]:
array([[ 3.5, 11.5],
[ 5.5, 13.5]])
You can use the ravel()
method if you need the final result as a 1-d array:
In [31]: a.mean(axis=(1, 3)).ravel()
Out[31]: array([ 3.5, 11.5, 5.5, 13.5])
See How can I vectorize the averaging of 2x2 sub-arrays of numpy array? for a similar question.
Here is one approach
In [29]: a = np.array(data)
In [30]: a2 = a.reshape(4,4)
In [31]: a3 = np.vstack((a2[:, :2], a2[:, 2:]))
In [32]: a4 = a3.reshape(4,4)
In [33]: np.mean(a4, axis=1)
Out[33]: array([ 3.5, 5.5, 11.5, 13.5])