问题
I have a numpy array and would like to count the number of occurences for each value, however, in a cumulative way
in = [0, 1, 0, 1, 2, 3, 0, 0, 2, 1, 1, 3, 3, 0, ...]
out = [0, 0, 1, 1, 0, 0, 2, 3, 1, 2, 3, 1, 2, 4, ...]
I'm wondering if it is best to create a (sparse) matrix with ones at col = i and row = in[i]
1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0
0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0
0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0
Then we could compute the cumsums along the rows and extract the numbers from the locations where the cumsums increment.
However, if we cumsum a sparse matrix, doesn't become dense? Is there an efficient way of doing it?
回答1:
Here's one vectorized approach using sorting
-
def cumcount(a):
# Store length of array
n = len(a)
# Get sorted indices (use later on too) and store the sorted array
sidx = a.argsort()
b = a[sidx]
# Mask of shifts/groups
m = b[1:] != b[:-1]
# Get indices of those shifts
idx = np.flatnonzero(m)
# ID array that will store the cumulative nature at the very end
id_arr = np.ones(n,dtype=int)
id_arr[idx[1:]+1] = -np.diff(idx)+1
id_arr[idx[0]+1] = -idx[0]
id_arr[0] = 0
c = id_arr.cumsum()
# Finally re-arrange those cumulative values back to original order
out = np.empty(n, dtype=int)
out[sidx] = c
return out
Sample run -
In [66]: a
Out[66]: array([0, 1, 0, 1, 2, 3, 0, 0, 2, 1, 1, 3, 3, 0])
In [67]: cumcount(a)
Out[67]: array([0, 0, 1, 1, 0, 0, 2, 3, 1, 2, 3, 1, 2, 4])
来源:https://stackoverflow.com/questions/48690864/python-vectorized-cumulative-counting