问题:

Given two sets of d-dimensional points. How can I most efficiently compute the pairwise squared euclidean distance matrix in Matlab?

Notation: Set one is given by a (numA,d)-matrix A and set two is given by a (numB,d)-matrix B. The resulting distance matrix shall be of the format (numA,numB).

Example points:

d = 4;            % dimension numA = 100;       % number of set 1 points numB = 200;       % number of set 2 points A = rand(numA,d); % set 1 given as matrix A B = rand(numB,d); % set 2 given as matrix B

回答1:

The usually given answer here is based on bsxfun (cf. e.g. [1]). My proposed approach is based on matrix multiplication and turns out to be much faster than any comparable algorithm I could find:

helpA = zeros(numA,3*d); helpB = zeros(numB,3*d); for idx = 1:d     helpA(:,3*idx-2:3*idx) = [ones(numA,1), -2*A(:,idx), A(:,idx).^2 ];     helpB(:,3*idx-2:3*idx) = [B(:,idx).^2 ,    B(:,idx), ones(numB,1)]; end distMat = helpA * helpB';

Please note: For constant d one can replace the for-loop by hardcoded implementations, e.g.

helpA(:,3*idx-2:3*idx) = [ones(numA,1), -2*A(:,1), A(:,1).^2, ... % d == 2                           ones(numA,1), -2*A(:,2), A(:,2).^2 ];   % etc.

Evaluation:

%% create some points d = 2; % dimension numA = 20000; numB = 20000; A = rand(numA,d); B = rand(numB,d);  %% pairwise distance matrix % proposed method: tic; helpA = zeros(numA,3*d); helpB = zeros(numB,3*d); for idx = 1:d     helpA(:,3*idx-2:3*idx) = [ones(numA,1), -2*A(:,idx), A(:,idx).^2 ];     helpB(:,3*idx-2:3*idx) = [B(:,idx).^2 ,    B(:,idx), ones(numB,1)]; end distMat = helpA * helpB'; toc;  % compare to pdist2: tic; pdist2(A,B).^2; toc;  % compare to [1]: tic; bsxfun(@plus,dot(A,A,2),dot(B,B,2)')-2*(A*B'); toc;  % Another method: added 07/2014 % compare to ndgrid method (cf. Dan's comment) tic; [idxA,idxB] = ndgrid(1:numA,1:numB); distMat = zeros(numA,numB); distMat(:) = sum((A(idxA,:) - B(idxB,:)).^2,2); toc;

Result:

Elapsed time is 1.796201 seconds. Elapsed time is 5.653246 seconds. Elapsed time is 3.551636 seconds. Elapsed time is 22.461185 seconds.

For a more detailed evaluation w.r.t. dimension and number of data points follow the discussion below (@comments). It turns out that different algos should be preferred in different settings. In non time critical situations just use the pdist2 version.

Further development: One can think of replacing the squared euclidean by any other metric based on the same principle:

help = zeros(numA,numB,d); for idx = 1:d     help(:,:,idx) = [ones(numA,1), A(:,idx)     ] * ...                     [B(:,idx)'   ; -ones(1,numB)]; end distMat = sum(ANYFUNCTION(help),3);

Nevertheless, this is quite time consuming. It could be useful to replace for smaller d the 3-dimensional matrix help by d 2-dimensional matrices. Especially for d = 1 it provides a method to compute the pairwise difference by a simple matrix multiplication:

pairDiffs = [ones(numA,1), A ] * [B'; -ones(1,numB)];

Do you have any further ideas?

转载请标明出处:Efficiently compute pairwise squared Euclidean distance in Matlab

文章来源: Efficiently compute pairwise squared Euclidean distance in Matlab

标签

numb