Fast (< n^2) clustering algorithm

前端 未结 6 1594
情话喂你
情话喂你 2021-01-30 00:27

I have 1 million 5-dimensional points that I need to group into k clusters with k << 1 million. In each cluster, no two points should be too far apart (e.g. they could be

6条回答
  •  醉酒成梦
    2021-01-30 00:54

    You might like to try my research project called K-tree. It scales well with large inputs with respect to k-means and forms a hierarchy of clusters. The trade-off is that it produce clusters with higher distortion. It has an average case runtime of O(n log n) and worst case of O(n**2) that only happens if you have some weird topology. More details of the complexity analysis are in my Masters thesis. I have used it with very high dimensional text data and had no problems.

    Sometimes bad splits can happen in the tree where all data goes to one side (cluster). The trunk in SVN deals with this differently than the current release. It randomly splits the data if there is a bad split. The previous method can force the tree to become too deep if there are bad splits.

提交回复
热议问题