What is a fast way to compute column by column correlation in matlab

前端 未结 2 742
粉色の甜心
粉色の甜心 2021-02-03 11:44

I have two very large matrices (60x25000) and I\'d like to compute the correlation between the columns only between the two matrices. For example:

corrVal(1) = c         


        
相关标签:
2条回答
  • 2021-02-03 11:48

    I can obtain a x100 speed improvement by computing it by hand.

    An=bsxfun(@minus,A,mean(A,1)); %%% zero-mean
    Bn=bsxfun(@minus,B,mean(B,1)); %%% zero-mean
    An=bsxfun(@times,An,1./sqrt(sum(An.^2,1))); %% L2-normalization
    Bn=bsxfun(@times,Bn,1./sqrt(sum(Bn.^2,1))); %% L2-normalization
    C=sum(An.*Bn,1); %% correlation
    

    You can compare using that code:

    A=rand(60,25000);
    B=rand(60,25000);
    
    tic;
    C=zeros(1,size(A,2));
    for i = 1:size(A,2)
        C(i)=corr(A(:,i), B(:,i));
    end
    toc; 
    
    tic
    An=bsxfun(@minus,A,mean(A,1));
    Bn=bsxfun(@minus,B,mean(B,1));
    An=bsxfun(@times,An,1./sqrt(sum(An.^2,1)));
    Bn=bsxfun(@times,Bn,1./sqrt(sum(Bn.^2,1)));
    C2=sum(An.*Bn,1);
    toc
    mean(abs(C-C2)) %% difference between methods
    

    Here are the computing times:

    Elapsed time is 10.822766 seconds.
    Elapsed time is 0.119731 seconds.
    

    The difference between the two results is very small:

    mean(abs(C-C2))
    
    ans =
      3.0968e-17
    

    EDIT: explanation

    bsxfun does a column-by-column operation (or row-by-row depending on the input).

    An=bsxfun(@minus,A,mean(A,1));
    

    This line will remove (@minus) the mean of each column (mean(A,1)) to each column of A... So basically it makes the columns of A zero-mean.

    An=bsxfun(@times,An,1./sqrt(sum(An.^2,1)));
    

    This line multiply (@times) each column by the inverse of its norm. So it makes them L-2 normalized.

    Once the columns are zero-mean and L2-normalized, to compute the correlation, you just have to make the dot product of each column of An with each column of B. So you multiply them element-wise An.*Bn, and then you sum each column: sum(An.*Bn);.

    0 讨论(0)
  • 2021-02-03 12:01

    I think the obvious loop might be good enough for your size of problem. On my laptop it takes less than 6 seconds to do the following:

    A = rand(60,25000);
    B = rand(60,25000);
    n = size(A,1);
    m = size(A,2);
    
    corrVal = zeros(1,m);
    for k=1:m
        corrVal(k) = corr(A(:,k),B(:,k));
    end
    
    0 讨论(0)
提交回复
热议问题