Given a set of several million points with x,y coordinates, what is the algorithm of choice for quickly finding the top 1000 nearest points from a location? \"Quickly\" here mea
If the set of points rarely changes, you could also consider using a voronoi diagram. I'm not sure if that helps finding the first point faster, but it should make it a lot easier to find the next 999 points.
I assume the points are in a database or some searchable indexed location? If so it should be pretty quick. From the given point you can have a range on the x and y axis and get all locations within that range (i.e. specify the top left most corner x(a) and y(b) and bottom most right corner x(c) and y(d)).
Then do a query where for points where y >= b AND y <= d AND x >= a AND x <=c. this will be quick assuming you have indexes on the x and y coordinates seperatly. (assuming origin is 0,0 at top left).
You can then increase (or decrease if result is huge) this range by z until the number of points within the result set is >= 1000. Through some trial runs you should be able to come up with a standard deviation and other statistical numbers that will help you determine the size of the rectangle to start with. Your program can also tune its self for this based on the results it gets.
Once you have the rough data set its pretty simple maths to work out the distance between each point and the source point.
i know its been said as not being the fastest if you want REALLY REALLY fast results by seeing i found this post from google i thought i'd add my SQL solution that i used a while ago in the form of a stored proc. It looks for locations close by the a coord and returns them by distance.
I hope it helps someone :)
CREATE PROCEDURE [dbo].[getstores] @lat float, @lng float AS
DECLARE @radius float, @DegToRad float
SET @DegToRad = 57.29577951
SET @radius = 25000
SELECT TOP 10
name
,sto_lat
,sto_lng
,postcode
,ROUND((ACOS((SIN(@lat/57.2958) * SIN(sto_lat/@DegToRad)) +(COS(@lat/@DegToRad) * COS(sto_lat/@DegToRad) *COS(sto_lng/@DegToRad - @lng/@DegToRad))))* 6387.7, 2) AS distance
FROM store
WHERE (sto_lat >= @lat - (@radius/111))
And (sto_lat <= @lat + (@radius/111))
AND (sto_lng >= @lng - (@radius/111))
AND (sto_lng <= @lng + (@radius/111))
AND (
ISNUMERIC(sto_lat) = 1
AND
ISNUMERIC(sto_lat) = 1
)
ORDER BY distance
NOTE: i have already stated that this is not the best solution for this question simply maybe for someone who found this on google like me
In addition to the QuadTree and BSP tree suggestions, you should look up nearest neighbour searching. The choice of algorithm is based on how often you are adding to your base dataset. If you are adding and removing often, tree solutions are superior. If the data is more static, nearest neighbour searching and voronoi diagrams can be much faster and scale better.
You want to use a structure like a Quad tree, or an RTree. These are multidimensional index structures.
The key is using a good "space filling curve", which is what helps define the nearness of points. A simple space filling curve is a Zorder, but you would be more interested in something like a hilbert curve.
http://en.wikipedia.org/wiki/Space_filling_curve
I don't know of any prepackaged implementations of this stuff. I recently implemented my own RTree in 2 dimensions that only supports bulk loading and searches (via a provided bounding box).
One drawback here is that your points have to be contained in a finite region. There know there are space filling curves that work for spaces that are not finite, but I do not know anything about them.
How about using quadtree?
You divide area to rectangles, if area has low density of points, rectangles are large, and if area has high density of points, rectangles will be small. You recursively subdivide each rectangle to four sub rectangles until rectangles are small enough or contain few enough points.
You can then start looking at points in rectangles near the location, and move outwards until you have found your 1000 points.
Code for this could get somewhat complex, so maybe you should try first with the simple grid and see if it is fast enough.