I have a numpy array which contains time series data. I want to bin that array into equal partitions of a given length (it is fine to drop the last partition if it is not t
Since you already have a numpy array, to avoid for loops, you can use reshape
and consider the new dimension to be the bin:
In [33]: data.reshape(2, -1)
Out[33]:
array([[4, 2, 5, 6, 7],
[5, 4, 3, 5, 7]])
In [34]: data.reshape(2, -1).mean(0)
Out[34]: array([ 4.5, 3. , 4. , 5.5, 7. ])
Actually this will just work if the size of data
is divisible by n
. I'll edit a fix.
Looks like Joe Kington has an answer that handles that.
Try this, using standard Python (NumPy isn't necessary for this). Assuming Python 2.x is in use:
data = [ 4, 2, 5, 6, 7, 5, 4, 3, 5, 7 ]
# example: for n == 2
n=2
partitions = [data[i:i+n] for i in xrange(0, len(data), n)]
partitions = partitions if len(partitions[-1]) == n else partitions[:-1]
# the above produces a list of lists
partitions
=> [[4, 2], [5, 6], [7, 5], [4, 3], [5, 7]]
# now the mean
[sum(x)/float(n) for x in partitions]
=> [3.0, 5.5, 6.0, 3.5, 6.0]
Just use reshape
and then mean(axis=1)
.
As the simplest possible example:
import numpy as np
data = np.array([4,2,5,6,7,5,4,3,5,7])
print data.reshape(-1, 2).mean(axis=1)
More generally, we'd need to do something like this to drop the last bin when it's not an even multiple:
import numpy as np
width=3
data = np.array([4,2,5,6,7,5,4,3,5,7])
result = data[:(data.size // width) * width].reshape(-1, width).mean(axis=1)
print result
I just wrote a function to apply it to all array size or dimension you want.
func is the function you want to apply to the bin (np.max for maxpooling, np.mean for an average ...)
def binArray(data, axis, binstep, binsize, func=np.nanmean):
data = np.array(data)
dims = np.array(data.shape)
argdims = np.arange(data.ndim)
argdims[0], argdims[axis]= argdims[axis], argdims[0]
data = data.transpose(argdims)
data = [func(np.take(data,np.arange(int(i*binstep),int(i*binstep+binsize)),0),0) for i in np.arange(dims[axis]//binstep)]
data = np.array(data).transpose(argdims)
return data
In you case it will be :
data = [4,2,5,6,7,5,4,3,5,7]
bin_data_mean = binArray(data, 0, 2, 2, np.mean)
or for the bin size of 3:
bin_data_mean = binArray(data, 0, 3, 3, np.mean)