I have an MxN sparse csr_matrix
, and I\'d like to add a few columns with only zeroes to the right of the matrix. In principle, the arrays indptr
, <
You can use scipy.sparse.vstack
or scipy.sparse.hstack
to do it faster:
from scipy.sparse import csr_matrix, vstack, hstack
B = csr_matrix((5, 2), dtype=int)
C = csr_matrix((5, 2), dtype=int)
D = csr_matrix((10, 10), dtype=int)
B2 = vstack((B, C))
#<10x2 sparse matrix of type '<type 'numpy.int32'>'
# with 0 stored elements in COOrdinate format>
hstack((B2, D))
#<10x12 sparse matrix of type '<type 'numpy.int32'>'
# with 0 stored elements in COOrdinate format>
Note that the output is a coo_matrix
, which can be efficiently converted to the CSR
or CSC
formats.
What you want to do isn't really what numpy or scipy understand as a reshape. But for your particular case, you can create a new CSR matrix reusing the data
, indices
and indptr
from your original one, without copying them:
import scipy.sparse as sps
a = sps.rand(10000, 10000, density=0.01, format='csr')
In [19]: %timeit sps.csr_matrix((a.data, a.indices, a.indptr),
... shape=(10000, 10020), copy=True)
100 loops, best of 3: 6.26 ms per loop
In [20]: %timeit sps.csr_matrix((a.data, a.indices, a.indptr),
... shape=(10000, 10020), copy=False)
10000 loops, best of 3: 47.3 us per loop
In [21]: %timeit sps.csr_matrix((a.data, a.indices, a.indptr),
... shape=(10000, 10020))
10000 loops, best of 3: 48.2 us per loop
So if you no longer need your original matrix a
, since the default is copy=False
, simply do:
a = sps.csr_matrix((a.data, a.indices, a.indptr), shape=(10000, 10020))