I am reading about the difference between k-means clustering and k-medoid clustering.
Supposedly there is an advantage to using the pairwise distance measure in the k-me
First of all, you can use k-medoids with any similarity measure. K-means however, may fail to converge - it really must only be used with distances that are consistent with the mean. So e.g. Absolute Pearson Correlation must not be used with k-means, but it works well with k-medoids.
Secondly, the medoid as used by k-medoids is roughly comparable to the median (in fact, there also is k-medians, which is like K-means but for Manhattan distance). If you look up literature on the median, you will see plenty of explanations and examples why the median is more robust to outliers than the arithmetic mean. Essentially, these explanations and examples will also hold for the medoid. It is a more robust estimate of a representative point than the mean as used in k-means.
Consider this 1-dimensional example:
[1, 2, 3, 4, 100000]
Both the median and medoid of this set are 3. The mean is 20002.
Which do you think is more representative of the data set? The mean has the lower squared error, but assuming that there might be a measurement error in this data set ...
Technically, the notion of breakdown point is used in statistics. The median has a breakdown point of 50% (i.e. half of the data points can be incorrect, and the result is still unaffected), whereas the mean has a breakdown point of 0 (i.e. a single large observation can yield a bad estimate).
I do not have a proof, but I assume the medoid will have a similar breakdown point as the median.
That's the main drawback. Usually, PAM takes much longer to run than k-means. As it involves computing all pairwise distances, it is O(n^2*k*i)
; whereas k-means runs in O(n*k*i)
where usually, k times the number of iterations is k*i << n
.