Get norm of numpy sparse matrix rows

Some simple fake data:

a = np.arange(9.).reshape(3,3)
s = sparse.csr_matrix(a)

To get the norm of each row from the sparse, you can use:


And the renormalized s would be


or to keep it sparse before renormalizing:


To get ordinary matrix or array from it, use:

m = s.todense()
a = s.toarray()

If you have enough memory for the dense version, you can get the norm of each row with:

n = np.sqrt(np.einsum('ij,ij->i',a,a))


n = np.apply_along_axis(np.linalg.norm, 1, a)

To normalize, you can do

an = a / n[:, None]

or, to normalize the original array in place:

a /= n[:, None]

The [:, None] thing basically transposes n to be a vertical array.

scipy.sparse is a great package, and it keeps getting better with every release, but a lot of things are still only half cooked, and you can get big performance improvements if you implement some of the algorithms yourself. For instance, a 7x improvement over @askewchan's implementation using scipy functions:

In [18]: a = sps.rand(1000, 1000, format='csr')

In [19]: a
<1000x1000 sparse matrix of type '<type 'numpy.float64'>'
    with 10000 stored elements in Compressed Sparse Row format>

In [20]: %timeit a.multiply(a).sum(1)
1000 loops, best of 3: 288 us per loop

In [21]: %timeit np.add.reduceat( *, a.indptr[:-1])
10000 loops, best of 3: 36.8 us per loop

In [24]: np.allclose(a.multiply(a).sum(1).ravel(),
    ...:             np.add.reduceat( *, a.indptr[:-1]))
Out[24]: True

You can similarly normalize the array in place doing the following:

norm_rows = np.sqrt(np.add.reduceat( *, a.indptr[:-1]))
nnz_per_row = np.diff(a.indptr) /= np.repeat(norm_rows, nnz_per_row)

If you are going to be using sparse matrices often, read the wikipedia page on compressed sparse formats, and you will often find better ways than the default to do things.
