Binning a numpy array

后端 未结 4 590
别跟我提以往
别跟我提以往 2021-01-04 08:04

I have a numpy array which contains time series data. I want to bin that array into equal partitions of a given length (it is fine to drop the last partition if it is not t

相关标签:
4条回答
  • 2021-01-04 08:33

    Since you already have a numpy array, to avoid for loops, you can use reshape and consider the new dimension to be the bin:

    In [33]: data.reshape(2, -1)
    Out[33]: 
    array([[4, 2, 5, 6, 7],
           [5, 4, 3, 5, 7]])
    
    In [34]: data.reshape(2, -1).mean(0)
    Out[34]: array([ 4.5,  3. ,  4. ,  5.5,  7. ])
    

    Actually this will just work if the size of data is divisible by n. I'll edit a fix.

    Looks like Joe Kington has an answer that handles that.

    0 讨论(0)
  • 2021-01-04 08:35

    Try this, using standard Python (NumPy isn't necessary for this). Assuming Python 2.x is in use:

    data = [ 4, 2, 5, 6, 7, 5, 4, 3, 5, 7 ]
    
    # example: for n == 2
    n=2
    partitions = [data[i:i+n] for i in xrange(0, len(data), n)]
    partitions = partitions if len(partitions[-1]) == n else partitions[:-1]
    
    # the above produces a list of lists
    partitions
    => [[4, 2], [5, 6], [7, 5], [4, 3], [5, 7]]
    
    # now the mean
    [sum(x)/float(n) for x in partitions]
    => [3.0, 5.5, 6.0, 3.5, 6.0]
    
    0 讨论(0)
  • 2021-01-04 08:43

    Just use reshape and then mean(axis=1).

    As the simplest possible example:

    import numpy as np
    
    data = np.array([4,2,5,6,7,5,4,3,5,7])
    
    print data.reshape(-1, 2).mean(axis=1)
    

    More generally, we'd need to do something like this to drop the last bin when it's not an even multiple:

    import numpy as np
    
    width=3
    data = np.array([4,2,5,6,7,5,4,3,5,7])
    
    result = data[:(data.size // width) * width].reshape(-1, width).mean(axis=1)
    
    print result
    
    0 讨论(0)
  • 2021-01-04 08:52

    I just wrote a function to apply it to all array size or dimension you want.

    • data is your array
    • axis is the axis you want to been
    • binstep is the number of points between each bin (allow overlapping bins)
    • binsize is the size of each bin
    • func is the function you want to apply to the bin (np.max for maxpooling, np.mean for an average ...)

      def binArray(data, axis, binstep, binsize, func=np.nanmean):
          data = np.array(data)
          dims = np.array(data.shape)
          argdims = np.arange(data.ndim)
          argdims[0], argdims[axis]= argdims[axis], argdims[0]
          data = data.transpose(argdims)
          data = [func(np.take(data,np.arange(int(i*binstep),int(i*binstep+binsize)),0),0) for i in np.arange(dims[axis]//binstep)]
          data = np.array(data).transpose(argdims)
          return data
      

    In you case it will be :

    data = [4,2,5,6,7,5,4,3,5,7]
    bin_data_mean = binArray(data, 0, 2, 2, np.mean)
    

    or for the bin size of 3:

    bin_data_mean = binArray(data, 0, 3, 3, np.mean)
    
    0 讨论(0)
提交回复
热议问题