Distributed cross correlation matrix computation

前端未结

关注

 2  657

How can I calculate pearson cross correlation matrix of large (>10TB) data set, possibly in distributed manner ? Any efficient distributed algorithm suggestion will be ap

相关标签:

2条回答

不知归路

2021-01-12 03:26

Each local data sets can converted into stdv and covariances. Also stdv and covariance and sum make correlation.

This is working example https://github.com/jeesim2/distributed-correlation

0 讨论(0)
发布评论:

提交评论
- 加载中...
没有蜡笔的小新

2021-01-12 03:29

To start with, have a look at this to see if things are going right. You may then refer to any of these implementations: MPI/OpenMP: Agomezl or Meismyles, MapReduce: Vangjee or Seawolf42. It'd also be interesting to read this before you proceed. On a different note, James's thesis provides some pointers if you're interested in computing the correlations that are robust to outliers.

0 讨论(0)
发布评论:

提交评论
- 加载中...