Algorithm for detecting “clusters” of dots [closed]

前端未结

关注

 16  1210

忘掉有多难

相关标签:

16条回答

清歌不尽

2020-12-22 18:22
1. Fit a probability density function to the data. I would use a "mixture of Gaussians" and fit it using Expectation Maximisation learning primed by the K-means algorithm. The K-means by itself can sometimes be sufficient without EM. The number of clusters itself would need to be primed with a model order selection algorithm.
2. Then, each point can be scored with p(x) using the model. I.e. get the posterior probability that the point was generated by the model.
3. Find the maximum p(x) to find the cluster centroids.
This can be coded very quickly in a tool like Matlab using a machine learning toolbox. MoG/EM learning/K-Means clustering are discussed widely on the web/standard texts. My favourite text is "Pattern classification" by Duda/Hart.
0 讨论(0)
发布评论:

提交评论
- 加载中...
轻奢々

2020-12-22 18:22

You could use a genetic algorithm for this. If you define a "cluster" as, say, a rectangular sub-area with a high dot density, you could create an initial set of "solutions", each of which consists of some number of randomly-generated, non-overlapping rectangular clusters. You would then write a "fitness function" which evaluates each solution - in this case, you would want the fitness function to minimize the total number of clusters while maximizing the dot density within each cluster.

Your initial set of "solutions" will all be terrible, most likely, but some will likely be slightly less terrible than the others. You use the fitness function to eliminate the worst solutions, then create the next generation of solutions by cross-breeding the "winners" from the last generation. By repeating this process generation by generation, you should end up with one or more good solutions to this problem.

For a genetic algorithm to work, the different possible solutions to a problem space have to be incrementally different from each other in terms of how well they solve the problem. Dot clusters are perfect for this.

0 讨论(0)
发布评论:

提交评论
- 加载中...
余生分开走

2020-12-22 18:23

I think it depends on how much seperation there is between the dots and clusters. If the distances are large and irregular, I would initially triangulate the points, and then delete/hide all the triangles with statistically large edge lengths. The remaining sub-triangulations form clusters of arbitrary shape. Traversing the edges of these sub-triangulations yields polygons which can be used to determine which specific points lie in each cluster. The polygons can also be compared to know shapes, such as Kent Fredric's torus, as required.

IMO, grid methods are good for quick and dirty solutions, but get very hungry very quickly on sparse data. Quad trees are better, but TINs are my personal favourite for any more complex analysis.

0 讨论(0)
发布评论:

提交评论
- 加载中...
梦毁少年i

2020-12-22 18:27

Cluster 3.0 includes a library of C methods for undertaking statistical clustering. It has a few different methods which may or may not solve your problem depedning on what form your dot clusters take. The library is available here here and is distributed under the Python license.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2 3

热议问题