Given: Given a set of N points in the 2D plane (x and y coordinates), and a set of N radii corresponding to each point. We will refer to a point\'s disc as the
It sounds like the obvious O(n^2) algorithm would be to create a graph with the points as vertices, and then connect two points if the conditions you give are met. And then you read off the connected components of the graph, discarding singletons. Also, the condition you gave for clustering sounds symmetric to me. Am I missing something?
k-means clustering based on a combination of local search and Lloyd's algorithm
http://www.cs.umd.edu/~mount/Projects/KMeans/
(Program is distributed under conditions of the GNU General Public License.)
k-means, k-medians, k-medoids, treecluster, self-organizing maps, clustercentroids, clusterdistance http://bonsai.hgc.jp/~mdehoon/software/cluster/cluster.pdf
You have a collection U of pairs (p,R) where p is a point and R its radius.
The relation ~ on U : (p,R) ~ (q,S) <=> p lies in q's disc or q lies in p's disc <=> |p-q| <= max(R,S)
is clearly reflexive and symmetric and so it's transitive closure (~, say) is an equivalence relation. The equivalence classes under ~ will be (singletons or) clusters.
I belive there are standard algorithms to compute the equivalence classes of the transitive closure of a relation like ~ above. For example this is discussed in Numerical Recipes in the chapter on sorting, and they say that their routine is base on Knuth.
(Sorry not to provide a link but a brief search didn't come up with exactly the right thing).
The brute force solution is only O(N2), so it should work for you.
1) Start with all of the points in the unassigned group.
2) Pick one point and look at all the others in the unassigned group and see whether the meet the radii criterion you describe.
3) At the end you will have grouped the points by your criteria, and will have done no more than N*(N/2) inspections.
Btw, what you describe is not what's normally meant by "clustering", so I think that's throwing people off here. What makes clustering a difficult problem is that the question of whether two neighboring points will be assigned to the same cluster is determined by all the other points in the data set. In your case, it's (basically) only determined by properties of the two points, so that in your case, you can just check them all.
Clustering is an NP-Hard problem even if you are given the number of clusters a priori, so you can probably give up on getting a polynomial run time. There are many many techniques to do this and the literature is mainly found in the machine learning community, k-means is probably the easiest algorithm to understand and implement.