Fast (< n^2) clustering algorithm

前端 未结 6 799
孤城傲影
孤城傲影 2021-01-30 00:34

I have 1 million 5-dimensional points that I need to group into k clusters with k << 1 million. In each cluster, no two points should be too far apart (e.g. they could be

6条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-01-30 00:58

    People have the impression that k-means is slow, but slowness is really only an issue for the EM algorithm (Lloyd's). Stochastic gradient methods for k-means are orders of magnitude faster than EM (see www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf).

    An implementation is here: http://code.google.com/p/sofia-ml/wiki/SofiaKMeans

提交回复
热议问题