Cluster analysis in R: determine the optimal number of clusters

后端未结

关注

 7  1931

Being a newbie in R, I\'m not very sure how to choose the best number of clusters to do a k-means analysis. After plotting a subset of below data, how many clusters will be

相关标签:

7条回答

清歌不尽

2020-11-22 10:54
Splendid answer from Ben. However I'm surprised that the Affinity Propagation (AP) method has been here suggested just to find the number of cluster for the k-means method, where in general AP do a better job clustering the data. Please see the scientific paper supporting this method in Science here:

Frey, Brendan J., and Delbert Dueck. "Clustering by passing messages between data points." science 315.5814 (2007): 972-976.

So if you are not biased toward k-means I suggest to use AP directly, which will cluster the data without requiring knowing the number of clusters:
```
library(apcluster)
apclus = apcluster(negDistMat(r=2), data)
show(apclus)
```
If negative euclidean distances are not appropriate, then you can use another similarity measures provided in the same package. For example, for similarities based on Spearman correlations, this is what you need:
```
sim = corSimMat(data, method="spearman")
apclus = apcluster(s=sim)
```
Please note that those functions for similarities in the AP package are just provided for simplicity. In fact, apcluster() function in R will accept any matrix of correlations. The same before with corSimMat() can be done with this:
```
sim = cor(data, method="spearman")
```
or
```
sim = cor(t(data), method="spearman")
```
depending on what you want to cluster on your matrix (rows or cols).
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2