Cumulative summation of a numpy array by index

后端未结

关注

 5  592

北海茫月 2021-02-08 11:54

Assume you have an array of values that will need to be summed together

d = [1,1,1,1,1]

and a second array specifying which elements need to be

5条回答

难免孤独 (楼主)

2021-02-08 12:46

In the general case when you want to sum submatrices by labels you can use the following code

import numpy as np
from scipy.sparse import coo_matrix

def labeled_sum1(x, labels):
     P = coo_matrix((np.ones(x.shape[0]), (labels, np.arange(len(labels)))))
     res = P.dot(x.reshape((x.shape[0], np.prod(x.shape[1:]))))
     return res.reshape((res.shape[0],) + x.shape[1:])

def labeled_sum2(x, labels):
     res = np.empty((np.max(labels) + 1,) + x.shape[1:], x.dtype)
     for i in np.ndindex(x.shape[1:]):
         res[(...,)+i] = np.bincount(labels, x[(...,)+i])
     return res

The first method use the sparse matrix multiplication. The second one is the generalization of user333700's answer. Both methods have comparable speed:

x = np.random.randn(100000, 10, 10)
labels = np.random.randint(0, 1000, 100000)
%time res1 = labeled_sum1(x, labels)
%time res2 = labeled_sum2(x, labels)
np.all(res1 == res2)

Output:

Wall time: 73.2 ms
Wall time: 68.9 ms
True

0 讨论(0)

查看其它5个回答