Cluster analysis in R: determine the optimal number of clusters

后端 未结 7 1931
星月不相逢
星月不相逢 2020-11-22 10:28

Being a newbie in R, I\'m not very sure how to choose the best number of clusters to do a k-means analysis. After plotting a subset of below data, how many clusters will be

相关标签:
7条回答
  • 2020-11-22 10:54

    Splendid answer from Ben. However I'm surprised that the Affinity Propagation (AP) method has been here suggested just to find the number of cluster for the k-means method, where in general AP do a better job clustering the data. Please see the scientific paper supporting this method in Science here:

    Frey, Brendan J., and Delbert Dueck. "Clustering by passing messages between data points." science 315.5814 (2007): 972-976.

    So if you are not biased toward k-means I suggest to use AP directly, which will cluster the data without requiring knowing the number of clusters:

    library(apcluster)
    apclus = apcluster(negDistMat(r=2), data)
    show(apclus)
    

    If negative euclidean distances are not appropriate, then you can use another similarity measures provided in the same package. For example, for similarities based on Spearman correlations, this is what you need:

    sim = corSimMat(data, method="spearman")
    apclus = apcluster(s=sim)
    

    Please note that those functions for similarities in the AP package are just provided for simplicity. In fact, apcluster() function in R will accept any matrix of correlations. The same before with corSimMat() can be done with this:

    sim = cor(data, method="spearman")
    

    or

    sim = cor(t(data), method="spearman")
    

    depending on what you want to cluster on your matrix (rows or cols).

    0 讨论(0)
提交回复
热议问题