Optimize MATLAB code (nested for loop to compute similarity matrix)

前端 未结 4 1929
不思量自难忘°
不思量自难忘° 2021-02-11 02:02

I am computing a similarity matrix based on Euclidean distance in MATLAB. My code is as follows:

for i=1:N % M,N is the size of the matrix x for whose elements I         


        
相关标签:
4条回答
  • 2021-02-11 02:37

    There's probably a better way to do it, but the first thing I noticed was that you could cut the runtime in half by exploiting the symmetry D(i,j)==D(i,j)

    You can also use the function norm(x(:,i)-x(:,j),2)

    0 讨论(0)
  • 2021-02-11 02:39

    I think this is what you're looking for.

    D=zeros(N);    
    jIndx=repmat(1:N,N,1);iIndx=jIndx'; %'# fix SO's syntax highlighting
    D(:)=sqrt(sum((x(iIndx(:),:)-x(jIndx(:),:)).^2,2));
    

    Here, I have assumed that the distance vector, x is initalized as an NxM array, where M is the number of dimensions of the system and N is the number of points. So if your ordering is different, you'll have to make changes accordingly.

    0 讨论(0)
  • 2021-02-11 02:44

    The function to do so in matlab is called pdist. Unfortunately it is painfully slow and doesnt take Matlabs vectorization abilities into account.

    The following is code I wrote for a project. Let me know what kind of speed up you get.

       Qx=repmat(dot(x,x,2),1,size(x,1));
       D=sqrt(Qx+Qx'-2*x*x');
    

    Note though that this will only work if your data points are in the rows and your dimensions the columns. So for example lets say I have 256 data points and 100000 dimensions then on my mac using x=rand(256,100000) and the above code produces a 256x256 matrix in about half a second.

    0 讨论(0)
  • 2021-02-11 02:52

    To start with, you are computing twice as much as you need to here, because D will be symmetric. You don't need to calculate the (i,j) entry and the (j,i) entry separately. Change your inner loop to for j=1:i, and add in the body of that loop D(j,i)=D(i,j);

    After that, there's really not much redundancy left in what that code does, so your only room left for improvement is to parallelize it: if you have the Parallel Computing Toolbox, convert your outer loop to a parfor and before you run it, say matlabpool(n), where n is the number of threads to use.

    0 讨论(0)
提交回复
热议问题