Binning a numpy array

后端未结

关注

 4  591

I have a numpy array which contains time series data. I want to bin that array into equal partitions of a given length (it is fine to drop the last partition if it is not t

相关标签:

4条回答

长情又很酷

2021-01-04 08:33
Since you already have a numpy array, to avoid for loops, you can use reshape and consider the new dimension to be the bin:
```
In [33]: data.reshape(2, -1)
Out[33]: 
array([[4, 2, 5, 6, 7],
       [5, 4, 3, 5, 7]])

In [34]: data.reshape(2, -1).mean(0)
Out[34]: array([ 4.5,  3. ,  4. ,  5.5,  7. ])
```
Actually this will just work if the size of data is divisible by n. I'll edit a fix.

Looks like Joe Kington has an answer that handles that.
0 讨论(0)
发布评论:

提交评论
- 加载中...

眼角桃花

2021-01-04 08:35

Try this, using standard Python (NumPy isn't necessary for this). Assuming Python 2.x is in use:

data = [ 4, 2, 5, 6, 7, 5, 4, 3, 5, 7 ]

# example: for n == 2
n=2
partitions = [data[i:i+n] for i in xrange(0, len(data), n)]
partitions = partitions if len(partitions[-1]) == n else partitions[:-1]

# the above produces a list of lists
partitions
=> [[4, 2], [5, 6], [7, 5], [4, 3], [5, 7]]

# now the mean
[sum(x)/float(n) for x in partitions]
=> [3.0, 5.5, 6.0, 3.5, 6.0]

0 讨论(0)

佛祖请我去吃肉

2021-01-04 08:43

Just use reshape and then mean(axis=1).

As the simplest possible example:

import numpy as np

data = np.array([4,2,5,6,7,5,4,3,5,7])

print data.reshape(-1, 2).mean(axis=1)

More generally, we'd need to do something like this to drop the last bin when it's not an even multiple:

import numpy as np

width=3
data = np.array([4,2,5,6,7,5,4,3,5,7])

result = data[:(data.size // width) * width].reshape(-1, width).mean(axis=1)

print result

0 讨论(0)

面向向阳花

2021-01-04 08:52

I just wrote a function to apply it to all array size or dimension you want.

data is your array
axis is the axis you want to been
binstep is the number of points between each bin (allow overlapping bins)
binsize is the size of each bin

func is the function you want to apply to the bin (np.max for maxpooling, np.mean for an average ...)

def binArray(data, axis, binstep, binsize, func=np.nanmean):
    data = np.array(data)
    dims = np.array(data.shape)
    argdims = np.arange(data.ndim)
    argdims[0], argdims[axis]= argdims[axis], argdims[0]
    data = data.transpose(argdims)
    data = [func(np.take(data,np.arange(int(i*binstep),int(i*binstep+binsize)),0),0) for i in np.arange(dims[axis]//binstep)]
    data = np.array(data).transpose(argdims)
    return data

In you case it will be :

data = [4,2,5,6,7,5,4,3,5,7]
bin_data_mean = binArray(data, 0, 2, 2, np.mean)

or for the bin size of 3:

bin_data_mean = binArray(data, 0, 3, 3, np.mean)

0 讨论(0)