Vectorizing a Numpy slice operation

后端 未结 3 977
梦如初夏
梦如初夏 2021-01-13 00:53

Say I have a Numpy vector,

A = zeros(100)

and I divide it into subvectors by a list of breakpoints which index into A, for ins

相关标签:
3条回答
  • 2021-01-13 01:05

    You could use np.repeat:

    In [35]: np.repeat(np.arange(0, len(breaks)-1), np.diff(breaks))
    Out[35]: 
    array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2,
           2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4,
           4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6,
           6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9,
           9, 9, 9, 9, 9, 9, 9, 9])
    

    To compute arbitrary binned statistics you could use scipy.stats.binned_statistic:

    import numpy as np
    import scipy.stats as stats
    
    breaks = np.linspace(0, 100, 11, dtype=int)
    A = np.random.random(100)
    
    means, bin_edges, binnumber = stats.binned_statistic(
        x=np.arange(len(A)), values=A, statistic='mean', bins=breaks)
    

    stats.binned_statistic can compute means, medians, counts, sums; or, to compute an arbitrary statistics for each bin, you can pass a callable to the statistic parameter:

    def func(values):
        return values.mean()
    
    funcmeans, bin_edges, binnumber = stats.binned_statistic(
        x=np.arange(len(A)), values=A, statistic=func, bins=breaks)
    
    assert np.allclose(means, funcmeans)
    
    0 讨论(0)
  • 2021-01-13 01:19

    There really isn't a single answer to your question, but several techniques that you can use as building blocks. Another one you may find helpful:

    All numpy ufuncs have a .reduceat method, which you can use to your advantage for some of your calculations:

    >>> a = np.arange(100)
    >>> breaks = np.linspace(0, 100, 11, dtype=np.intp)
    >>> counts = np.diff(breaks)
    >>> counts
    array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
    >>> sums = np.add.reduceat(a, breaks[:-1], dtype=np.float)
    >>> sums
    array([  45.,  145.,  245.,  345.,  445.,  545.,  645.,  745.,  845.,  945.])
    >>> sums / counts  # i.e. the mean
    array([  4.5,  14.5,  24.5,  34.5,  44.5,  54.5,  64.5,  74.5,  84.5,  94.5])
    
    0 讨论(0)
  • 2021-01-13 01:22

    You can use simple np.cumsum -

    import numpy as np
    
    # Form zeros array of same size as input array and 
    # place ones at positions where intervals change
    A1 = np.zeros_like(A)
    A1[breaks[1:-1]] = 1
    
    # Perform cumsum along it to create a staircase like array, as the final output
    out = A1.cumsum()
    

    Sample run -

    In [115]: A
    Out[115]: array([3, 8, 0, 4, 6, 4, 8, 0, 2, 7, 4, 9, 3, 7, 3, 8, 6, 7, 1, 6])
    
    In [116]: breaks
    Out[116]: array([ 0,  4,  9, 11, 18, 20])
    
    In [142]: out
    Out[142]: array([0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4]..)
    

    If you want to have mean values of those subvectors from A, you can use np.bincount -

    mean_vals = np.bincount(out, weights=A)/np.bincount(out)
    

    If you are looking to extend this functionality and use a custom function instead, you might want to look into MATLAB's accumarray equivalent for Python/Numpy: accum whose source code is available here.

    0 讨论(0)
提交回复
热议问题