Correlation coefficients for sparse matrix in python?

前端 未结 4 1914
[愿得一人]
[愿得一人] 2021-02-07 11:31

Does anyone know how to compute a correlation matrix from a very large sparse matrix in python? Basically, I am looking for something like numpy.corrcoef that will

4条回答
  •  感情败类
    2021-02-07 12:19

    Unfortunately, Alt's answer didn't work out for me. The values given to the np.sqrt function where mostly negative, so the resulting covariance values were nan.

    I wasn't able to use ali_m's answer as well, because my matrix was too large that I couldn't fit the centering = rowsum.dot(rowsum.T.conjugate()) / n matrix in my memory (My matrix's dimensions are: 3.5*10^6 x 33)

    Instead, I used scikit-learn's StandardScaler to compute the standard sparse matrix and then used a multiplication to obtain the correlation matrix.

    from sklearn.preprocessing import StandardScaler
    
    def compute_sparse_correlation_matrix(A):
        scaler = StandardScaler(with_mean=False)
        scaled_A = scaler.fit_transform(A)  # Assuming A is a CSR or CSC matrix
        corr_matrix = (1/scaled_A.shape[0]) * (scaled_A.T @ scaled_A)
        return corr_matrix
    

    I believe that this approach is faster and more robust than the other mentioned approaches. Moreover, it also preserves the sparsity pattern of the input matrix.

提交回复
热议问题