Is Triangle inequality necessary for kmeans?

后端 未结 2 467
灰色年华
灰色年华 2021-02-06 16:40

I wonder if Triangle inequality is necessary for the distance measure used in kmeans.

2条回答
  •  死守一世寂寞
    2021-02-06 17:35

    k-means is designed for Euclidean distance, which happens to satisfy triangle inequality.

    Using other distance functions is risky, as it may stop converging. The reason however is not the triangle inequality, but the mean might not minimize the distance function. (The arithmetic mean minimizes the sum-of-squares, not arbitrary distances!)

    There are faster methods for k-means that exploit the triangle inequality to avoid recomputations. But if you stick to classic MacQueen or Lloyd k-means, then you do not need the triangle inequality.

    Just be careful with using other distance functions to not run into an infinite loop. You need to prove that the mean minimizes your distances to the cluster centers. If you cannot prove this, it may fail to converge, as the objective function no longer decreases monotonically! So you really should try to prove convergence for your distance function!

提交回复
热议问题