Adding a column of zeroes to a csr_matrix

前端 未结 2 784
一整个雨季
一整个雨季 2021-01-05 07:01

I have an MxN sparse csr_matrix, and I\'d like to add a few columns with only zeroes to the right of the matrix. In principle, the arrays indptr, <

相关标签:
2条回答
  • 2021-01-05 07:45

    You can use scipy.sparse.vstack or scipy.sparse.hstack to do it faster:

    from scipy.sparse import csr_matrix, vstack, hstack
    
    B = csr_matrix((5, 2), dtype=int)
    C = csr_matrix((5, 2), dtype=int)
    D = csr_matrix((10, 10), dtype=int)
    
    B2 = vstack((B, C))
    #<10x2 sparse matrix of type '<type 'numpy.int32'>'
    #        with 0 stored elements in COOrdinate format>
    
    hstack((B2, D))
    #<10x12 sparse matrix of type '<type 'numpy.int32'>'
    #        with 0 stored elements in COOrdinate format>
    

    Note that the output is a coo_matrix, which can be efficiently converted to the CSR or CSC formats.

    0 讨论(0)
  • 2021-01-05 07:54

    What you want to do isn't really what numpy or scipy understand as a reshape. But for your particular case, you can create a new CSR matrix reusing the data, indices and indptr from your original one, without copying them:

    import scipy.sparse as sps
    
    a = sps.rand(10000, 10000, density=0.01, format='csr')
    
    In [19]: %timeit sps.csr_matrix((a.data, a.indices, a.indptr),
    ...                             shape=(10000, 10020), copy=True)
    100 loops, best of 3: 6.26 ms per loop
    
    In [20]: %timeit sps.csr_matrix((a.data, a.indices, a.indptr),
    ...                             shape=(10000, 10020), copy=False)
    10000 loops, best of 3: 47.3 us per loop
    
    In [21]: %timeit sps.csr_matrix((a.data, a.indices, a.indptr),
    ...                             shape=(10000, 10020))
    10000 loops, best of 3: 48.2 us per loop
    

    So if you no longer need your original matrix a, since the default is copy=False, simply do:

    a = sps.csr_matrix((a.data, a.indices, a.indptr), shape=(10000, 10020))
    
    0 讨论(0)
提交回复
热议问题