why is row indexing of scipy csr matrices slower compared to numpy arrays

后端 未结 1 868
星月不相逢
星月不相逢 2021-01-03 17:38

I\'m not sure what I am doing wrong but it seems that row indexing a scipy csr_matrix is approximately 2 folds slower compared to numpy arrays (see code below).

相关标签:
1条回答
  • 2021-01-03 18:09

    The short answer that I demonstrate below is that constructing a new sparse matrix is expensive. There's a significant overhead that is not dependent on the number of rows or the number on nonzero elements in a particular row.


    Data representation for sparse matrices is quite different from that for dense array. Arrays store the data in one one contiguous buffer, and efficiently use the shape and strides to iterate over selected values. Those values, plus the index, define exactly were in the buffer it will find the data. Copying those N bytes from location to another is a relatively minor part of the whole operation.

    A sparse matrix stores the data in several arrays (or other structures), containing the indexes and the data. Selecting a row then requires looking up the relevant indices, and constructing a new sparse matrix with selected indices and data. There is compiled code in the sparse package, but not nearly as much low level code as with numpy arrays.

    To illustrate I'll make small matrix, and not so dense, so we don't have a lot of empty rows:

    In [259]: A = (sparse.rand(5,5,.4,'csr')*20).floor()
    In [260]: A
    Out[260]: 
    <5x5 sparse matrix of type '<class 'numpy.float64'>'
        with 10 stored elements in Compressed Sparse Row format>
    

    The dense equivalent, and a row copy:

    In [262]: Ad=A.A
    In [263]: Ad
    Out[263]: 
    array([[  0.,   0.,   0.,   0.,  10.],
           [  0.,   0.,   0.,   0.,   0.],
           [ 17.,  16.,  14.,  19.,   6.],
           [  0.,   0.,   1.,   0.,   0.],
           [ 14.,   0.,   9.,   0.,   0.]])
    In [264]: Ad[4,:]
    Out[264]: array([ 14.,   0.,   9.,   0.,   0.])
    In [265]: timeit Ad[4,:].copy()
    100000 loops, best of 3: 4.58 µs per loop
    

    A matrix row:

    In [266]: A[4,:]
    Out[266]: 
    <1x5 sparse matrix of type '<class 'numpy.float64'>'
        with 2 stored elements in Compressed Sparse Row format>
    

    Look at the data representation for this csr matrix, 3 1d arrays:

    In [267]: A.data
    Out[267]: array([  0.,  10.,  17.,  16.,  14.,  19.,   6.,   1.,  14.,   9.])  
    In [268]: A.indices
    Out[268]: array([3, 4, 0, 1, 2, 3, 4, 2, 0, 2], dtype=int32)
    In [269]: A.indptr
    Out[269]: array([ 0,  2,  2,  7,  8, 10], dtype=int32)
    

    Here's how the row is selected (but in compiled code):

    In [270]: A.indices[A.indptr[4]:A.indptr[5]]
    Out[270]: array([0, 2], dtype=int32)
    In [271]: A.data[A.indptr[4]:A.indptr[5]]
    Out[271]: array([ 14.,   9.])
    

    The 'row' is another sparse matrix, with the same sort of data arrays:

    In [272]: A[4,:].indptr
    Out[272]: array([0, 2])
    In [273]: A[4,:].indices
    Out[273]: array([0, 2])
    In [274]: timeit A[4,:]
    

    Yes, timing for the sparse matrix is slow. I don't know how much of the time is spent in actually selecting the data, and how much is spent constructing the new matrix.

    10000 loops, best of 3: 145 µs per loop
    In [275]: timeit Ad[4,:].copy()
    100000 loops, best of 3: 4.56 µs per loop
    

    lil format may easier to understand, since the data and indices are stored in sublists, one per row.

    In [276]: Al=A.tolil()
    In [277]: Al.data
    Out[277]: array([[0.0, 10.0], [], [17.0, 16.0, 14.0, 19.0, 6.0], [1.0], [14.0, 9.0]], dtype=object)
    In [278]: Al.rows
    Out[278]: array([[3, 4], [], [0, 1, 2, 3, 4], [2], [0, 2]], dtype=object)
    In [279]: Al[4,:].data
    Out[279]: array([[14.0, 9.0]], dtype=object)
    In [280]: Al[4,:].rows
    Out[280]: array([[0, 2]], dtype=object)
    

    Speed comparisons like this make some sense when dealing with tight compiled code, where movements for bytes from part of memory to another are significant time consumers. With the mix of Python and compiled code in numpy and scipy you can't just count O(n) operations.

    =============================

    Here's an estimate of time it takes to selected a row from A, and time it takes to return a new sparse matrix:

    Just get the data:

    In [292]: %%timeit
    d1=A.data[A.indptr[4]:A.indptr[5]]
    i1=A.indices[A.indptr[4]:A.indptr[5]]
       .....: 
    100000 loops, best of 3: 4.92 µs per loop
    

    plus the time it takes to make a matrix:

    In [293]: %%timeit
    d1=A.data[A.indptr[4]:A.indptr[5]]
    i1=A.indices[A.indptr[4]:A.indptr[5]]
    sparse.csr_matrix((d1,([0,0],i1)),shape=(1,5))
       .....: 
    1000 loops, best of 3: 445 µs per loop
    

    Try a simpler coo matrix

    In [294]: %%timeit
    d1=A.data[A.indptr[4]:A.indptr[5]]
    i1=A.indices[A.indptr[4]:A.indptr[5]]
    sparse.coo_matrix((d1,([0,0],i1)),shape=(1,5))
       .....: 
    10000 loops, best of 3: 135 µs per loop
    
    0 讨论(0)
提交回复
热议问题