Efficient slicing of matrices using matrix multiplication, with Python, NumPy, SciPy

后端 未结 1 1034
别跟我提以往
别跟我提以往 2021-01-06 09:05

I want to reshape a 2d scipy.sparse.csr.csr_matrix(let us call it A) to a 2d numpy.ndarray (let us call this B).

相关标签:
1条回答
  • 2021-01-06 09:26

    Using matrix multiplication you can do an efficient slicing creating a "slicer" matrix with ones at the right places. The sliced matrix will have the same type as the "slicer", so you can control in an efficient way your output type.

    Below you will see some comparisons and the most efficient for you case is to ask for the .A matrix and slice it. It showed to be much faster than the .toarray() method. Using multiplication is the second fastest option when the "slicer" is created as a ndarray, multiplied by the csr matrix and slice the result .

    OBS: using a coo sparse for matrix A resulted in a slightly slower timing, keeping the same proportions, and sol3 is not applicable, I realized later that in the multiplication it is converted to a csr automatically.

    import scipy
    import scipy.sparse.csr as csr
    test = csr.csr_matrix([
    [11,12,13,14,15,16,17,18,19],
    [21,22,23,24,25,26,27,28,29],
    [31,32,33,34,35,36,37,38,39],
    [41,42,43,44,45,46,47,48,49],
    [51,52,53,54,55,56,57,58,59],
    [61,62,63,64,65,66,67,68,69],
    [71,72,73,74,75,76,77,78,79],
    [81,82,83,84,85,86,88,88,89],
    [91,92,93,94,95,96,99,98,99]])
    
    def sol1():
        B = test.A[2:5]
    
    def sol2():
        slicer = scipy.array([[0,0,0,0,0,0,0,0,0],
                              [0,0,0,0,0,0,0,0,0],
                              [0,0,1,0,0,0,0,0,0],
                              [0,0,0,1,0,0,0,0,0],
                              [0,0,0,0,1,0,0,0,0]])
        B = (slicer*test)[2:]
        return B
    
    def sol3():
        B = (test[2:5]).A
        return B
    
    def sol4():
        slicer = csr.csr_matrix( ((1,1,1),((2,3,4),(2,3,4))), shape=(5,9) )
        B = ((slicer*test).A)[2:] # just changing when we do the slicing
        return B
    
    def sol5():
        slicer = csr.csr_matrix( ((1,1,1),((2,3,4),(2,3,4))), shape=(5,9) )
        B = ((slicer*test)[2:]).A
        return B
    
    
    timeit sol1()
    #10000 loops, best of 3: 60.4 us per loop
    
    timeit sol2()
    #10000 loops, best of 3: 91.4 us per loop
    
    timeit sol3()
    #10000 loops, best of 3: 111 us per loop
    
    timeit sol4()
    #1000 loops, best of 3: 310 us per loop
    
    timeit sol5()
    #1000 loops, best of 3: 363 us per loop
    

    EDIT: the answer has been updated replacing .toarray() by .A, giving much faster results and now the best solutions are placed on top

    0 讨论(0)
提交回复
热议问题