Assume you have an array of values that will need to be summed together
d = [1,1,1,1,1]
and a second array specifying which elements need to be
In the general case when you want to sum submatrices by labels you can use the following code
import numpy as np
from scipy.sparse import coo_matrix
def labeled_sum1(x, labels):
P = coo_matrix((np.ones(x.shape[0]), (labels, np.arange(len(labels)))))
res = P.dot(x.reshape((x.shape[0], np.prod(x.shape[1:]))))
return res.reshape((res.shape[0],) + x.shape[1:])
def labeled_sum2(x, labels):
res = np.empty((np.max(labels) + 1,) + x.shape[1:], x.dtype)
for i in np.ndindex(x.shape[1:]):
res[(...,)+i] = np.bincount(labels, x[(...,)+i])
return res
The first method use the sparse matrix multiplication. The second one is the generalization of user333700's answer. Both methods have comparable speed:
x = np.random.randn(100000, 10, 10)
labels = np.random.randint(0, 1000, 100000)
%time res1 = labeled_sum1(x, labels)
%time res2 = labeled_sum2(x, labels)
np.all(res1 == res2)
Output:
Wall time: 73.2 ms
Wall time: 68.9 ms
True