I\'m not sure what I am doing wrong but it seems that row indexing a scipy csr_matrix
is approximately 2 folds slower compared to numpy arrays (see code below).
The short answer that I demonstrate below is that constructing a new sparse matrix is expensive. There's a significant overhead that is not dependent on the number of rows or the number on nonzero elements in a particular row.
Data representation for sparse matrices is quite different from that for dense array. Arrays store the data in one one contiguous buffer, and efficiently use the shape
and strides
to iterate over selected values. Those values, plus the index, define exactly were in the buffer it will find the data. Copying those N
bytes from location to another is a relatively minor part of the whole operation.
A sparse matrix stores the data in several arrays (or other structures), containing the indexes and the data. Selecting a row then requires looking up the relevant indices, and constructing a new sparse matrix with selected indices and data. There is compiled code in the sparse package, but not nearly as much low level code as with numpy arrays.
To illustrate I'll make small matrix, and not so dense, so we don't have a lot of empty rows:
In [259]: A = (sparse.rand(5,5,.4,'csr')*20).floor()
In [260]: A
Out[260]:
<5x5 sparse matrix of type '<class 'numpy.float64'>'
with 10 stored elements in Compressed Sparse Row format>
The dense equivalent, and a row copy:
In [262]: Ad=A.A
In [263]: Ad
Out[263]:
array([[ 0., 0., 0., 0., 10.],
[ 0., 0., 0., 0., 0.],
[ 17., 16., 14., 19., 6.],
[ 0., 0., 1., 0., 0.],
[ 14., 0., 9., 0., 0.]])
In [264]: Ad[4,:]
Out[264]: array([ 14., 0., 9., 0., 0.])
In [265]: timeit Ad[4,:].copy()
100000 loops, best of 3: 4.58 µs per loop
A matrix row:
In [266]: A[4,:]
Out[266]:
<1x5 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>
Look at the data representation for this csr
matrix, 3 1d arrays:
In [267]: A.data
Out[267]: array([ 0., 10., 17., 16., 14., 19., 6., 1., 14., 9.])
In [268]: A.indices
Out[268]: array([3, 4, 0, 1, 2, 3, 4, 2, 0, 2], dtype=int32)
In [269]: A.indptr
Out[269]: array([ 0, 2, 2, 7, 8, 10], dtype=int32)
Here's how the row is selected (but in compiled code):
In [270]: A.indices[A.indptr[4]:A.indptr[5]]
Out[270]: array([0, 2], dtype=int32)
In [271]: A.data[A.indptr[4]:A.indptr[5]]
Out[271]: array([ 14., 9.])
The 'row' is another sparse matrix, with the same sort of data arrays:
In [272]: A[4,:].indptr
Out[272]: array([0, 2])
In [273]: A[4,:].indices
Out[273]: array([0, 2])
In [274]: timeit A[4,:]
Yes, timing for the sparse matrix is slow. I don't know how much of the time is spent in actually selecting the data, and how much is spent constructing the new matrix.
10000 loops, best of 3: 145 µs per loop
In [275]: timeit Ad[4,:].copy()
100000 loops, best of 3: 4.56 µs per loop
lil
format may easier to understand, since the data and indices are stored in sublists, one per row.
In [276]: Al=A.tolil()
In [277]: Al.data
Out[277]: array([[0.0, 10.0], [], [17.0, 16.0, 14.0, 19.0, 6.0], [1.0], [14.0, 9.0]], dtype=object)
In [278]: Al.rows
Out[278]: array([[3, 4], [], [0, 1, 2, 3, 4], [2], [0, 2]], dtype=object)
In [279]: Al[4,:].data
Out[279]: array([[14.0, 9.0]], dtype=object)
In [280]: Al[4,:].rows
Out[280]: array([[0, 2]], dtype=object)
Speed comparisons like this make some sense when dealing with tight compiled code, where movements for bytes from part of memory to another are significant time consumers. With the mix of Python and compiled code in numpy
and scipy
you can't just count O(n)
operations.
=============================
Here's an estimate of time it takes to selected a row from A
, and time it takes to return a new sparse matrix:
Just get the data:
In [292]: %%timeit
d1=A.data[A.indptr[4]:A.indptr[5]]
i1=A.indices[A.indptr[4]:A.indptr[5]]
.....:
100000 loops, best of 3: 4.92 µs per loop
plus the time it takes to make a matrix:
In [293]: %%timeit
d1=A.data[A.indptr[4]:A.indptr[5]]
i1=A.indices[A.indptr[4]:A.indptr[5]]
sparse.csr_matrix((d1,([0,0],i1)),shape=(1,5))
.....:
1000 loops, best of 3: 445 µs per loop
Try a simpler coo
matrix
In [294]: %%timeit
d1=A.data[A.indptr[4]:A.indptr[5]]
i1=A.indices[A.indptr[4]:A.indptr[5]]
sparse.coo_matrix((d1,([0,0],i1)),shape=(1,5))
.....:
10000 loops, best of 3: 135 µs per loop