Distributed cross correlation matrix computation

前端 未结 2 656
甜味超标
甜味超标 2021-01-12 02:37

How can I calculate pearson cross correlation matrix of large (>10TB) data set, possibly in distributed manner ? Any efficient distributed algorithm suggestion will be ap

2条回答
  •  不知归路
    2021-01-12 03:26

    Each local data sets can converted into stdv and covariances. Also stdv and covariance and sum make correlation.

    This is working example https://github.com/jeesim2/distributed-correlation

提交回复
热议问题