Fast (< n^2) clustering algorithm

前端 未结 6 810
孤城傲影
孤城傲影 2021-01-30 00:34

I have 1 million 5-dimensional points that I need to group into k clusters with k << 1 million. In each cluster, no two points should be too far apart (e.g. they could be

6条回答
  •  攒了一身酷
    2021-01-30 00:55

    You might like to try my research project called K-tree. It scales well with large inputs with respect to k-means and forms a hierarchy of clusters. The trade-off is that it produce clusters with higher distortion. It has an average case runtime of O(n log n) and worst case of O(n**2) that only happens if you have some weird topology. More details of the complexity analysis are in my Masters thesis. I have used it with very high dimensional text data and had no problems.

    Sometimes bad splits can happen in the tree where all data goes to one side (cluster). The trunk in SVN deals with this differently than the current release. It randomly splits the data if there is a bad split. The previous method can force the tree to become too deep if there are bad splits.

提交回复
热议问题