Improving Performance of Multiplication of Scipy Sparse Matrices

后端未结

关注

 2  1262

Given a Scipy CSC Sparse matrix \"sm\" with dimensions (170k x 170k) with 440 million non-null points and a sparse CSC vector \"v\" (170k x 1) with a few non-null points, is the

相关标签:

2条回答

南方客

2021-02-03 11:26
The sparse matrix multiplication routines are directly coded in C++, and as far as a quick look at the source reveals, there doesn't seem to be any hook to any optimized library. Furthermore, it doesn't seem to be taking advantage of the fact that the second matrix is a vector to minimize calculations. So you can probably speed things up quite a bit by accessing the guts of the sparse matrix, and customizing the multiplication algorithm. The following code does so in pure Python/Numpy, and if the vector really has "a few non-null points" it matches the speed of scipy's C++ code: if you implemented it in Cython, the speed increase should be noticeable:
```
def sparse_col_vec_dot(csc_mat, csc_vec):
    # row numbers of vector non-zero entries
    v_rows = csc_vec.indices
    v_data = csc_vec.data
    # matrix description arrays
    m_dat = csc_mat.data
    m_ind = csc_mat.indices
    m_ptr = csc_mat.indptr
    # output arrays
    sizes = m_ptr.take(v_rows+1) - m_ptr.take(v_rows)
    sizes = np.concatenate(([0], np.cumsum(sizes)))
    data = np.empty((sizes[-1],), dtype=csc_mat.dtype)
    indices = np.empty((sizes[-1],), dtype=np.intp)
    indptr = np.zeros((2,), dtype=np.intp)

    for j in range(len(sizes)-1):
        slice_ = slice(*m_ptr[[v_rows[j] ,v_rows[j]+1]])
        np.multiply(m_dat[slice_], v_data[j], out=data[sizes[j]:sizes[j+1]])
        indices[sizes[j]:sizes[j+1]] = m_ind[slice_]
    indptr[-1] = len(data)
    ret = sps.csc_matrix((data, indices, indptr),
                         shape=csc_vec.shape)
    ret.sum_duplicates()

    return ret
```
A quick explanation of what is going on: a CSC matrix is defined in three linear arrays:
- data contains the non-zero entries, stored in column major order.
- indices contains the rows of the non-zero entries.
- indptr has one entry more than the number of columns of the matrix, and items in column j are found in data[indptr[j]:indptr[j+1]] and are in rows indices[indptr[j]:indptr[j+1]].
So to multiply by a sparse column vector, you can iterate over data and indices of the column vector, and for each (d, r) pair, extract the corresponding column of the matrix and multiply it by d, i.e. data[indptr[r]:indptr[r+1]] * d and indices[indptr[r]:indptr[r+1]].
0 讨论(0)
发布评论:

提交评论
- 加载中...
旧巷少年郎

2021-02-03 11:39
Recently i had the same issue. I solved it like this.
```
def sparse_col_vec_dot(csc_mat, csc_vec):
    curr_mat = csc_mat.tocsr()
    ret curr_mat* csc_vec
```
The trick here is we have to make one version of the matrix as row representation and the other version as column representation.
0 讨论(0)
发布评论:

提交评论
- 加载中...