I want to reshape a 2d scipy.sparse.csr.csr_matrix
(let us call it A
) to a 2d numpy.ndarray
(let us call this B
).
Using matrix multiplication you can do an efficient slicing creating a "slicer" matrix with ones at the right places. The sliced matrix will have the same type
as the "slicer", so you can control in an efficient way your output type.
Below you will see some comparisons and the most efficient for you case is to ask for the .A
matrix and slice it. It showed to be much faster than the .toarray()
method. Using multiplication is the second fastest option when the "slicer" is created as a ndarray
, multiplied by the csr
matrix and slice the result .
OBS: using a coo
sparse for matrix A
resulted in a slightly slower timing, keeping the same proportions, and sol3
is not applicable, I realized later that in the multiplication it is converted to a csr
automatically.
import scipy
import scipy.sparse.csr as csr
test = csr.csr_matrix([
[11,12,13,14,15,16,17,18,19],
[21,22,23,24,25,26,27,28,29],
[31,32,33,34,35,36,37,38,39],
[41,42,43,44,45,46,47,48,49],
[51,52,53,54,55,56,57,58,59],
[61,62,63,64,65,66,67,68,69],
[71,72,73,74,75,76,77,78,79],
[81,82,83,84,85,86,88,88,89],
[91,92,93,94,95,96,99,98,99]])
def sol1():
B = test.A[2:5]
def sol2():
slicer = scipy.array([[0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0,0],
[0,0,1,0,0,0,0,0,0],
[0,0,0,1,0,0,0,0,0],
[0,0,0,0,1,0,0,0,0]])
B = (slicer*test)[2:]
return B
def sol3():
B = (test[2:5]).A
return B
def sol4():
slicer = csr.csr_matrix( ((1,1,1),((2,3,4),(2,3,4))), shape=(5,9) )
B = ((slicer*test).A)[2:] # just changing when we do the slicing
return B
def sol5():
slicer = csr.csr_matrix( ((1,1,1),((2,3,4),(2,3,4))), shape=(5,9) )
B = ((slicer*test)[2:]).A
return B
timeit sol1()
#10000 loops, best of 3: 60.4 us per loop
timeit sol2()
#10000 loops, best of 3: 91.4 us per loop
timeit sol3()
#10000 loops, best of 3: 111 us per loop
timeit sol4()
#1000 loops, best of 3: 310 us per loop
timeit sol5()
#1000 loops, best of 3: 363 us per loop
EDIT: the answer has been updated replacing .toarray()
by .A
, giving much faster results and now the best solutions are placed on top