I want to use scipy.spatial\'s KDTree to find nearest neighbor pairs in a two dimensional array (essentially a list of lists where the dimension of the nested list is 2). I gene
I have used scipy.spatial
before, and it appears to be a nice improvement (especially wrt the interface) as compared to scikits.ann
.
In this case I think you have confused the return from your tree.query(...)
call. From the scipy.spatial.KDTree.query
docs:
Returns
-------
d : array of floats
The distances to the nearest neighbors.
If x has shape tuple+(self.m,), then d has shape tuple if
k is one, or tuple+(k,) if k is larger than one. Missing
neighbors are indicated with infinite distances. If k is None,
then d is an object array of shape tuple, containing lists
of distances. In either case the hits are sorted by distance
(nearest first).
i : array of integers
The locations of the neighbors in self.data. i is the same
shape as d.
So in this case when you query for the nearest to [1,1]
you are getting:
distance to nearest: 0.0
index of nearest in original array: 0
This means that [1,1]
is the first row of your original data in array
, which is expected given your data is y = x on the range [1,50]
.
The scipy.spatial.KDTree.query
function has lots of other options, so if for example you wanted to make sure to get the nearest neighbour that isn't itself try:
tree.query([1,1], k=2)
This will return the two nearest neighbours, which you could apply further logic to such that cases where the distance returned is zero (i.e. the point queried is one of data items used to build the tree) the second nearest neighbour is taken rather than the first.