Directly use Intel mkl library on Scipy sparse matrix to calculate A dot A.T with less memory

前端 未结 1 1414
情话喂你
情话喂你 2020-12-01 15:35

I want to call mkl.mkl_scsrmultcsr from python. The goal is to calculate a sparse matrix C in compressed sparse row format. Sparse matrix C is the matrix product between A a

相关标签:
1条回答
  • 2020-12-01 16:14

    Look at the Python code for the scipy sparse product. Notice that it calls the compiled code in 2 passes.

    It looks like the mkl code does the same thing

    https://software.intel.com/en-us/node/468640

    If request=1, the routine computes only values of the array ic of length m + 1, the memory for this array must be allocated beforehand. On exit the value ic(m+1) - 1 is the actual number of the elements in the arrays c and jc.

    If request=2, the routine has been called previously with the parameter request=1, the output arrays jc and c are allocated in the calling program and they are of the length ic(m+1) - 1 at least.

    You first allocated ic based on the number of rows of C (you know that from the inputs), and call the mkl code with request=1.

    For request=2 you have to allocate c and jc arrays, based on the size in ic(m+1) - 1. This is not the same as the number of nnz in the input arrays.

    You are using request1 = c_int(0), which requires that the c arrays be the correct size - which you don't know without actually performing the dot (or a request 1).

    ==================

    File:        /usr/lib/python3/dist-packages/scipy/sparse/compressed.py
    Definition:  A._mul_sparse_matrix(self, other)
    

    pass 1 allocates indptr (note size), and passes the pointers (data doesn't matter at this pass)

        indptr = np.empty(major_axis + 1, dtype=idx_dtype)
    
        fn = getattr(_sparsetools, self.format + '_matmat_pass1')
        fn(M, N,
           np.asarray(self.indptr, dtype=idx_dtype),
           np.asarray(self.indices, dtype=idx_dtype),
           np.asarray(other.indptr, dtype=idx_dtype),
           np.asarray(other.indices, dtype=idx_dtype),
           indptr)
    
        nnz = indptr[-1]
    

    pass 2 allocates indptr (different size), and based on nnz indices and data.

        indptr = np.asarray(indptr, dtype=idx_dtype)
        indices = np.empty(nnz, dtype=idx_dtype)
        data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype))
    
        fn = getattr(_sparsetools, self.format + '_matmat_pass2')
        fn(M, N, np.asarray(self.indptr, dtype=idx_dtype),
           np.asarray(self.indices, dtype=idx_dtype),
           self.data,
           np.asarray(other.indptr, dtype=idx_dtype),
           np.asarray(other.indices, dtype=idx_dtype),
           other.data,
           indptr, indices, data)
    

    Last make a new array using these arrays.

        return self.__class__((data,indices,indptr),shape=(M,N))
    

    The mkl library should be used in the same way.

    ===================

    https://github.com/scipy/scipy/blob/master/scipy/sparse/sparsetools/csr.h

    has c code for csr_matmat_pass1 and csr_matmat_pass2

    ====================

    In case it helps, here's a pure Python implementation of these passes. A literal translation without any attempt to take advantage of any array operations.

    def pass1(A, B):
        nrow,ncol=A.shape
        Aptr=A.indptr
        Bptr=B.indptr
        Cp=np.zeros(nrow+1,int)
        mask=np.zeros(ncol,int)-1
        nnz=0
        for i in range(nrow):
            row_nnz=0
            for jj in range(Aptr[i],Aptr[i+1]):
                j=A.indices[jj]
                for kk in range(Bptr[j],Bptr[j+1]):
                    k=B.indices[kk]
                    if mask[k]!=i:
                        mask[k]=i
                        row_nnz += 1
            nnz += row_nnz
            Cp[i+1]=nnz
        return Cp
        
    def pass2(A,B,Cnnz):
        nrow,ncol=A.shape
        Ap,Aj,Ax=A.indptr,A.indices,A.data
        Bp,Bj,Bx=B.indptr,B.indices,B.data
    
        next = np.zeros(ncol,int)-1
        sums = np.zeros(ncol,A.dtype)
        
        Cp=np.zeros(nrow+1,int)
        Cj=np.zeros(Cnnz,int)
        Cx=np.zeros(Cnnz,A.dtype)
        nnz = 0
        for i in range(nrow):
            head = -2
            length = 0
            for jj in range(Ap[i], Ap[i+1]):
                j, v = Aj[jj], Ax[jj]
                for kk in range(Bp[j], Bp[j+1]):
                    k = Bj[kk]
                    sums[k] += v*Bx[kk]
                    if (next[k]==-1):
                        next[k], head=head, k
                        length += 1
            print(i,sums, next)
            for _ in range(length):
                if sums[head] !=0:
                    Cj[nnz], Cx[nnz] = head, sums[head]
                    nnz += 1
                temp = head
                head = next[head]
                next[temp], sums[temp] = -1, 0
            Cp[i+1] = nnz            
        return Cp, Cj, Cx                  
        
    def pass0(A,B):
        Cp = pass1(A,B)
        nnz = Cp[-1]
        Cp,Cj,Cx=pass2(A,B,nnz)
        N,M=A.shape[0], B.shape[1]
        C=sparse.csr_matrix((Cx, Cj, Cp), shape=(N,M))
        return C
    

    Both A and B have to be csr. It using a transpose it needs conversion, e.g. B = A.T.tocsr().

    0 讨论(0)
提交回复
热议问题