Efficient way to set elements to zero where mask is True on scipy sparse matrix

后端 未结 1 1226
迷失自我
迷失自我 2021-01-02 02:38

I have two scipy_sparse_csr_matrix \'a\' and scipy_sparse_csr_matrix(boolean) \'mask\', and I want to set elements of \'a\' to zero where element of mask is True.

fo

相关标签:
1条回答
  • 2021-01-02 02:51

    My initial impression is that this multiply and subtract approach is a reasonable one. Quite often sparse code implements operations as some sort of multiplication, even if the dense equivalents use more direct methods. The sparse sum over rows or columns uses a matrix multiplication with the appropriate row or column matrix of 1s. Even row or column indexing uses matrix multiplication (at least on the csr format).

    Sometimes we can improve on operations by working directly with the matrix attributes (data, indices, indptr). But that requires a lot more thought and experimentation.

    For the dense arrays my first try would be

    In [611]: a.A*~(mask.A)
    Out[611]: 
    array([[0, 0, 0],
           [0, 1, 0],
           [7, 0, 0]], dtype=int32)
    

    But there isn't a direct way of doing not to a sparse matrix. If mask was indeed sparse, ~mask would not be. In your example mask has 4 True terms, and 5 False, so a dense version would work just as well:

    In [612]: nmask=sparse.csr_matrix(~(mask.A))
    In [615]: a.multiply(nmask)
    Out[615]: 
    <3x3 sparse matrix of type '<class 'numpy.int32'>'
        with 2 stored elements in Compressed Sparse Row format>
    

    CSR scipy matrix does not update after updating its values explores setting the diagonal of a sparse matrix to 0. It is possible to set values of the data attribute to 0, and then eliminate_zeros once at the end.

    The other dense method is

    In [618]: a1=a.A
    In [619]: a1[mask.A]=0
    

    This also works in sparse - sort of

    In [622]: a2=a.copy()
    In [624]: a2[mask]
    Out[624]: matrix([[0, 3, 5, 0]], dtype=int32)
    In [625]: a2[mask]=0
    /usr/local/lib/python3.5/dist-packages/scipy/sparse/compressed.py:730: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
      SparseEfficiencyWarning)
    In [626]: a2
    Out[626]: 
    <3x3 sparse matrix of type '<class 'numpy.int32'>'
        with 6 stored elements in Compressed Sparse Row format>
    

    As noted in the previous question, we need to eliminate the zeros:

    In [628]: a2.eliminate_zeros()
    In [629]: a2
    Out[629]: 
    <3x3 sparse matrix of type '<class 'numpy.int32'>'
        with 2 stored elements in Compressed Sparse Row format>
    

    Taking a hint from the sparsity warning let's try the lil format

    In [638]: al=a.tolil()
    In [639]: al[mask]
    Out[639]: 
    <1x4 sparse matrix of type '<class 'numpy.int32'>'
        with 2 stored elements in LInked List format>
    In [640]: al[mask]=0
    In [641]: al
    Out[641]: 
    <3x3 sparse matrix of type '<class 'numpy.int32'>'
        with 2 stored elements in LInked List format>
    

    It's interesting that al[mask] is still sparse, where as a[mask] is dense. The 2 formats use different indexing methods.

    At some low level of sparsity, it might be worth iterating over the True (nonzero) elements of mask, setting the corresponding terms of a to zero directly.

    I'm not going to guess as to the relative speeds of these methods. That needs to be tested on realistic data.

    0 讨论(0)
提交回复
热议问题