Cumulative summation of a numpy array by index

后端 未结 5 555
北海茫月
北海茫月 2021-02-08 11:54

Assume you have an array of values that will need to be summed together

d = [1,1,1,1,1]

and a second array specifying which elements need to be

相关标签:
5条回答
  • 2021-02-08 12:37
    def zeros(ilen):
     r = []
     for i in range(0,ilen):
         r.append(0)
    
    i_list = [0,0,1,2,2]
    d = [1,1,1,1,1]
    result = zeros(max(i_list)+1)
    
    for index in i_list:
      result[index]+=d[index]
    
    print result
    
    0 讨论(0)
  • 2021-02-08 12:46

    In the general case when you want to sum submatrices by labels you can use the following code

    import numpy as np
    from scipy.sparse import coo_matrix
    
    def labeled_sum1(x, labels):
         P = coo_matrix((np.ones(x.shape[0]), (labels, np.arange(len(labels)))))
         res = P.dot(x.reshape((x.shape[0], np.prod(x.shape[1:]))))
         return res.reshape((res.shape[0],) + x.shape[1:])
    
    def labeled_sum2(x, labels):
         res = np.empty((np.max(labels) + 1,) + x.shape[1:], x.dtype)
         for i in np.ndindex(x.shape[1:]):
             res[(...,)+i] = np.bincount(labels, x[(...,)+i])
         return res
    

    The first method use the sparse matrix multiplication. The second one is the generalization of user333700's answer. Both methods have comparable speed:

    x = np.random.randn(100000, 10, 10)
    labels = np.random.randint(0, 1000, 100000)
    %time res1 = labeled_sum1(x, labels)
    %time res2 = labeled_sum2(x, labels)
    np.all(res1 == res2)
    

    Output:

    Wall time: 73.2 ms
    Wall time: 68.9 ms
    True
    
    0 讨论(0)
  • 2021-02-08 12:50

    This solution should be more efficient for large arrays (it iterates over the possible index values instead of the individual entries of i):

    import numpy as np
    
    i = np.array([0,0,1,2,2])
    d = np.array([0,1,2,3,4])
    
    i_max = i.max()
    c = np.empty(i_max+1)
    for j in range(i_max+1):
        c[j] = d[i==j].sum()
    
    print c
    [1. 2. 7.]
    
    0 讨论(0)
  • 2021-02-08 12:51

    If I understand the question correctly, there is a fast function for this (as long as the data array is 1d)

    >>> i = np.array([0,0,1,2,2])
    >>> d = np.array([0,1,2,3,4])
    >>> np.bincount(i, weights=d)
    array([ 1.,  2.,  7.])
    

    np.bincount returns an array for all integers range(max(i)), even if some counts are zero

    0 讨论(0)
  • 2021-02-08 12:54

    Juh_'s comment is the most efficient solution. Here's working code:

    import numpy as np
    import scipy.ndimage as ni
    
    i = np.array([0,0,1,2,2])
    d = np.array([0,1,2,3,4])
    
    n_indices = i.max() + 1
    print ni.sum(d, i, np.arange(n_indices))
    
    0 讨论(0)
提交回复
热议问题