问题
I have an Nx2 array K1
with the location of N keypoints and a 3 dimensional WxHx3 array Kart1(width,height,coordinates)
that maps coordinates to every pixel of an image. For every keypoint in K1
I want to read the location of the pixel in Kart1
and evaluate the coordinates (search for the min/max or calculate the mean) in a 3x3 kernel around it and assign a value to the current pixel in KPCoor1
.
My current approach looks like this:
for ii=1:length(K1(:,1)) %for every keypoint in K1
MinDist=sqrt(sum(Kart1(K1(ii,2)-1,K1(ii,1)-1,:).^2)); %Calculate distance
xShift=0;
yShift=0;
for kk=-1:1 %for every pixel in a 3x3 kernel...
for ll=-1:1
Distance=sqrt(sum(Kart1(K1(ii,2)+kk,K1(ii,1)+ll,:).^2));
if Distance<MinDist %... if the current distance is smaller than MinDist
MinDist=Distance; %... update MinDist...
xShift=kk; %... and take the kernel coordinate of the pixel
yShift=ll;
end
end
end
KP1Coor(ii,:)=Kart1(K1(ii,2)+xShift,K1(ii,1)+yShift,:); %assign the coordinates of the pixel with the minimal distance in kernel.
end
and it runs, but is ugly and I doubt it's doing what I want to do. I am a bit confused by the "multidimensionality" of the matter, don't know many functions to evaluate kernels, and can't think of a way to use vectorization functions like bsxfun()
or logic operations (means I'm stuck and my brain is dry :/)
Any suggestions on how to eliminate those loops/correct the code?
回答1:
Vectorized approach: After diving into the vectorized implementation, it looked like a very interesting look-up problem
, so if you are still interested in vectorized techniques to get the job done, here's an approach with bsxfun -
%// Scalars to replace their repeated usages in the code
[W,H,NC]= size(Kart1);
d3a = 1:NC;
%// Indices for Kart1 at the first level of the nested loops
BI = bsxfun(@plus,K1(:,2)+(K1(:,1)-1)*W,(d3a-1)*W*H);
%// Indices for Kart1 in 3x3 kernel around the indices obtained at first level
BIW3 = bsxfun(@plus,[-1 0 1]',[-W 0 W]); %//'
%// Linear indices for the minimum value of Kart1 in the 3x3 neighborhood
[~,MI3] = min(sqrt(sum(Kart1(bsxfun(@plus,...
BI,permute(BIW3(:),[3 2 1]))).^2,2)),[],3);
%// X-Y indices
[xShift1,yShift1] = ind2sub([3 3],MI3);
%// Get Kart1 values corresponding to the indices for the minimum values
KP1Coor = Kart1(bsxfun(@plus,...
K1(:,2)+xShift1-2 +(K1(:,1)+yShift1-2-1)*W,(d3a-1)*W*H));
Benchmarking
I was able to test it out with GPU too using gpuArray
from Parallel Computing Toolbox and then run some benchmarks with W = 1000
, H = 1000
and used N
as the datasize varying it with [1000 2000 5000 10000 20000]
. The results seem crazy enough, though not unreliable as approved benchmarking methods were used from Measure and Improve GPU Performance. Here is the benchmark plot for original and CPU and GPU vectorized codes -
It looked appropriate to then benchmark only the vectorized codes and for even larger datasizes, whose plot is shown next -
Conclusions: The problem basically looks like a look-up problem, where Kart1
is the data and K1
is the array of indices to look-up for. The vectorized solution presented here is basically a brute-force approach and the benchmark results clearly favor this for performance. But, it would be interesting to see if any non-brute-force approach that might be even loop-based, but leverages this looking up efficiently could perform better.
来源:https://stackoverflow.com/questions/22354771/vectorization-of-min-distance-in-kernel