Get norm of numpy sparse matrix rows

ぐ巨炮叔叔 提交于 2019-12-03 15:17:52

Some simple fake data:

a = np.arange(9.).reshape(3,3)
s = sparse.csr_matrix(a)

To get the norm of each row from the sparse, you can use:

np.sqrt(s.multiply(s).sum(1))

And the renormalized s would be

s.multiply(1/np.sqrt(s.multiply(s).sum(1)))

or to keep it sparse before renormalizing:

s.multiply(sparse.csr_matrix(1/np.sqrt(s.multiply(s).sum(1))))

To get ordinary matrix or array from it, use:

m = s.todense()
a = s.toarray()

If you have enough memory for the dense version, you can get the norm of each row with:

n = np.sqrt(np.einsum('ij,ij->i',a,a))

or

n = np.apply_along_axis(np.linalg.norm, 1, a)

To normalize, you can do

an = a / n[:, None]

or, to normalize the original array in place:

a /= n[:, None]

The [:, None] thing basically transposes n to be a vertical array.

scipy.sparse is a great package, and it keeps getting better with every release, but a lot of things are still only half cooked, and you can get big performance improvements if you implement some of the algorithms yourself. For instance, a 7x improvement over @askewchan's implementation using scipy functions:

In [18]: a = sps.rand(1000, 1000, format='csr')

In [19]: a
Out[19]: 
<1000x1000 sparse matrix of type '<type 'numpy.float64'>'
    with 10000 stored elements in Compressed Sparse Row format>

In [20]: %timeit a.multiply(a).sum(1)
1000 loops, best of 3: 288 us per loop

In [21]: %timeit np.add.reduceat(a.data * a.data, a.indptr[:-1])
10000 loops, best of 3: 36.8 us per loop

In [24]: np.allclose(a.multiply(a).sum(1).ravel(),
    ...:             np.add.reduceat(a.data * a.data, a.indptr[:-1]))
Out[24]: True

You can similarly normalize the array in place doing the following:

norm_rows = np.sqrt(np.add.reduceat(a.data * a.data, a.indptr[:-1]))
nnz_per_row = np.diff(a.indptr)
a.data /= np.repeat(norm_rows, nnz_per_row)

If you are going to be using sparse matrices often, read the wikipedia page on compressed sparse formats, and you will often find better ways than the default to do things.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!