问题
Suppose I have a scipy.sparse.csr_matrix
representing the values below
[[0 0 1 2 0 3 0 4]
[1 0 0 2 0 3 4 0]]
I want to calculate the cumulative sum of non-zero values in-place, which would change the array to:
[[0 0 1 3 0 6 0 10]
[1 0 0 3 0 6 10 0]]
The actual values are not 1, 2, 3, ...
The number of non-zero values in each row are unlikely to be the same.
How to do this fast?
Current program:
import scipy.sparse
import numpy as np
# sparse data
a = scipy.sparse.csr_matrix(
[[0,0,1,2,0,3,0,4],
[1,0,0,2,0,3,4,0]],
dtype=int)
# method
indptr = a.indptr
data = a.data
for i in range(a.shape[0]):
st = indptr[i]
en = indptr[i + 1]
np.cumsum(data[st:en], out=data[st:en])
# print result
print(a.todense())
Result:
[[ 0 0 1 3 0 6 0 10]
[ 1 0 0 3 0 6 10 0]]
回答1:
How about doing this instead
a = np.array([[0,0,1,2,0,3,0,4],
[1,0,0,2,0,3,4,0]], dtype=int)
b = a.copy()
b[b > 0] = 1
z = np.cumsum(a,axis=1)
print(z*b)
Yields
array([[ 0, 0, 1, 3, 0, 6, 0, 10],
[ 1, 0, 0, 3, 0, 6, 10, 0]])
Doing sparse
def sparse(a):
a = scipy.sparse.csr_matrix(a)
indptr = a.indptr
data = a.data
for i in range(a.shape[0]):
st = indptr[i]
en = indptr[i + 1]
np.cumsum(data[st:en], out=data[st:en])
In[1]: %timeit sparse(a)
10000 loops, best of 3: 167 µs per loop
Using multiplication
def mult(a):
b = a.copy()
b[b > 0] = 1
z = np.cumsum(a, axis=1)
z * b
In[2]: %timeit mult(a)
100000 loops, best of 3: 5.93 µs per loop
来源:https://stackoverflow.com/questions/45492626/scipy-sparse-cumsum