A few years ago, someone posted on Active State Recipes for comparison purposes, three python/NumPy functions; each of these accepted the same arguments and returne
TL; DR The second code above is only looping over the number of dimensions of the points (3 times through the for loop for 3D points) so the looping isn't much there. The real speed-up in the second code above is that it better harnesses the power of Numpy to avoid creating some extra matrices when finding the differences between points. This reduces memory used and computational effort.
Longer Explanation
I think that the calcDistanceMatrixFastEuclidean2
function is deceiving you with its loop perhaps. It is only looping over the number of dimensions of the points. For 1D points, the loop only executes once, for 2D, twice, and for 3D, thrice. This is really not much looping at all.
Let's analyze the code a little bit to see why the one is faster than the other. calcDistanceMatrixFastEuclidean
I will call fast1
and calcDistanceMatrixFastEuclidean2
will be fast2
.
fast1
is based on the Matlab way of doing things as is evidenced by the repmap
function. The repmap
function creates an array in this case that is just the original data repeated over and over again. However, if you look at the code for the function, it is very inefficient. It uses many Numpy functions (3 reshape
s and 2 repeat
s) to do this. The repeat
function is also used to create an array that contains the the original data with each data item repeated many times. If our input data is [1,2,3]
then we are subtracting [1,2,3,1,2,3,1,2,3]
from [1,1,1,2,2,2,3,3,3]
. Numpy has had to create a lot of extra matrices in between running Numpy's C code which could have been avoided.
fast2
uses more of Numpy's heavy lifting without creating as many matrices between Numpy calls. fast2
loops through each dimension of the points, does the subtraction and keeps a running total of the squared differences between each dimension. Only at the end is the square root done. So far, this may not sound quite as efficient as fast1
, but fast2
avoids doing the repmat
stuff by using Numpy's indexing. Let's look at the 1D case for simplicity. fast2
makes a 1D array of the data and subtracts it from a 2D (N x 1) array of the data. This creates the difference matrix between each point and all of the other points without having to use repmat
and repeat
and thereby bypasses creating a lot of extra arrays. This is where the real speed difference lies in my opinion. fast1
creates a lot of extra in between matrices (and they are created expensively computationally) to find the differences between points while fast2
better harnesses the power of Numpy to avoid these.
By the way, here is a little bit faster version of fast2
:
def calcDistanceMatrixFastEuclidean3(nDimPoints):
nDimPoints = array(nDimPoints)
n,m = nDimPoints.shape
data = nDimPoints[:,0]
delta = (data - data[:,newaxis])**2
for d in xrange(1,m):
data = nDimPoints[:,d]
delta += (data - data[:,newaxis])**2
return sqrt(delta)
The difference is that we are no longer creating delta as a zeros matrix.