Efficient algorithm to group points in clusters by distance between every two points

这一生的挚爱 提交于 2021-02-07 13:30:56

问题


I am looking for an efficient algorithm for the following problem:

Given a set of points in 2D space, where each point is defined by its X and Y coordinates. Required to split this set of points into a set of clusters so that if distance between two arbitrary points is less then some threshold, these points must belong to the same cluster:

In other words, such cluster is a set of points which are 'close enough' to each other.

The naive algorithm may look like this:

  1. Let R be a resulting list of clusters, initially empty
  2. Let P be a list of points, initially contains all points
  3. Pick random point from P and create a cluster C which contains only this point. Delete this point from P
  4. For every point Pi from P 4a. For every point Pc from C 4aa. If distance(Pi, Pc) < threshold then add Pi to C and remove it from P
  5. If at least one point was added to cluster C during the step 4, go to step 4
  6. Add cluster C to list R. if P is not empty, go to step 3

However, naive approach is very inefficient. I wonder if there is a better algorithm for this problem?

P.S. I don't know the number of clusters apriori


回答1:


There are some classic algorithms here:

  • Hierarchical Agglomerative Clustering
  • DBSCAN

that you should read and understand.




回答2:


  1. Split up the space of points into a grid. This grid would have unit length equal to threshhold / sqrt(8).

  2. Iterate though the list of points P, adding each point to both the square it occupies and a new cluster. If a point is added to a square which already contains a point, add it to the cluster of the other point(s). I'll call the list of all occupied sqaures S.

  3. Now take any square from S and its cluster c. For each adjacent or diagonal square, combine the cluster of that square with c and remove the square from S. Repeat the process for all squares just added.

  4. Once no more adjacent sqaures can be found, the cluster is finished and can be added to C. Repeat step 3 with any remaining squares in S. When S is empty, you're finished.



来源:https://stackoverflow.com/questions/32428520/efficient-algorithm-to-group-points-in-clusters-by-distance-between-every-two-po

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!