I have a very large masked NumPy array (originalArray
) with many rows and two columns. I want take the average of every two rows in originalArray
The mean of two values a
and b
is 0.5*(a+b)
Therefore you can do it like this:
newArray = 0.5*(originalArray[0::2] + originalArray[1::2])
It will sum up all two consecutive rows and in the end multiply every element by 0.5
.
Since in the title you are asking for avg over N rows, here is a more general solution:
def groupedAvg(myArray, N=2):
result = np.cumsum(myArray, 0)[N-1::N]/float(N)
result[1:] = result[1:] - result[:-1]
return result
The general form of the average over n
elements is sum([x1,x2,...,xn])/n
.
The sum of elements m
to m+n
in vector v
is the same as subtracting the m-1
th element from the m+n
th element of cumsum(v)
. Unless m
is 0, in that case you don't subtract anything (result[0]).
That is what we take advantage of here. Also since everything is linear, it is not important where we divide by N
, so we do it right at the beginning, but that is just a matter of taste.
If the last group has less than N
elements, it will be ignored completely.
If you don't want to ignore it, you have to treat the last group specially:
def avg(myArray, N=2):
cum = np.cumsum(myArray,0)
result = cum[N-1::N]/float(N)
result[1:] = result[1:] - result[:-1]
remainder = myArray.shape[0] % N
if remainder != 0:
if remainder < myArray.shape[0]:
lastAvg = (cum[-1]-cum[-1-remainder])/float(remainder)
else:
lastAvg = cum[-1]/float(remainder)
result = np.vstack([result, lastAvg])
return result