Correlation coefficients for sparse matrix in python?

前端 未结 4 1908
[愿得一人]
[愿得一人] 2021-02-07 11:31

Does anyone know how to compute a correlation matrix from a very large sparse matrix in python? Basically, I am looking for something like numpy.corrcoef that will

4条回答
  •  北荒
    北荒 (楼主)
    2021-02-07 11:57

    You do not need to introduce a large dense matrix. Just keep it sparse using Numpy:

    import numpy as np    
    def sparse_corr(A):
        N = A.shape[0]
        C=((A.T*A -(sum(A).T*sum(A)/N))/(N-1)).todense()
        V=np.sqrt(np.mat(np.diag(C)).T*np.mat(np.diag(C)))
        COR = np.divide(C,V+1e-119)
        return COR
    

    Testing the performance:

    A = sparse.rand(1000000, 100, density=0.1, format='csr')
    sparse_corr(A)
    

提交回复
热议问题