Does anyone know how to compute a correlation matrix from a very large sparse matrix in python? Basically, I am looking for something like numpy.corrcoef
that will
You do not need to introduce a large dense matrix. Just keep it sparse using Numpy:
import numpy as np
def sparse_corr(A):
N = A.shape[0]
C=((A.T*A -(sum(A).T*sum(A)/N))/(N-1)).todense()
V=np.sqrt(np.mat(np.diag(C)).T*np.mat(np.diag(C)))
COR = np.divide(C,V+1e-119)
return COR
A = sparse.rand(1000000, 100, density=0.1, format='csr')
sparse_corr(A)