Correlation coefficients for sparse matrix in python?

前端未结

关注

 4  1914

[愿得一人] 2021-02-07 11:31

Does anyone know how to compute a correlation matrix from a very large sparse matrix in python? Basically, I am looking for something like numpy.corrcoef that will

4条回答

感情败类 (楼主)

2021-02-07 12:19
Unfortunately, Alt's answer didn't work out for me. The values given to the np.sqrt function where mostly negative, so the resulting covariance values were nan.

I wasn't able to use ali_m's answer as well, because my matrix was too large that I couldn't fit the centering = rowsum.dot(rowsum.T.conjugate()) / n matrix in my memory (My matrix's dimensions are: 3.5*10^6 x 33)

Instead, I used scikit-learn's StandardScaler to compute the standard sparse matrix and then used a multiplication to obtain the correlation matrix.
```
from sklearn.preprocessing import StandardScaler

def compute_sparse_correlation_matrix(A):
    scaler = StandardScaler(with_mean=False)
    scaled_A = scaler.fit_transform(A)  # Assuming A is a CSR or CSC matrix
    corr_matrix = (1/scaled_A.shape[0]) * (scaled_A.T @ scaled_A)
    return corr_matrix
```
I believe that this approach is faster and more robust than the other mentioned approaches. Moreover, it also preserves the sparsity pattern of the input matrix.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...