Cluster analysis in R: determine the optimal number of clusters

后端未结

关注

 7  1926

星月不相逢 2020-11-22 10:28

Being a newbie in R, I\'m not very sure how to choose the best number of clusters to do a k-means analysis. After plotting a subset of below data, how many clusters will be

7条回答

情歌与酒 (楼主)

2020-11-22 10:44

In order to determine optimal k-cluster in clustering methods. I usually using Elbow method accompany by Parallel processing to avoid time-comsuming. This code can sample like this:

Elbow method

elbow.k <- function(mydata){
dist.obj <- dist(mydata)
hclust.obj <- hclust(dist.obj)
css.obj <- css.hclust(dist.obj,hclust.obj)
elbow.obj <- elbow.batch(css.obj)
k <- elbow.obj$k
return(k)
}

Running Elbow parallel

no_cores <- detectCores()
    cl<-makeCluster(no_cores)
    clusterEvalQ(cl, library(GMD))
    clusterExport(cl, list("data.clustering", "data.convert", "elbow.k", "clustering.kmeans"))
 start.time <- Sys.time()
 elbow.k.handle(data.clustering))
 k.clusters <- parSapply(cl, 1, function(x) elbow.k(data.clustering))
    end.time <- Sys.time()
    cat('Time to find k using Elbow method is',(end.time - start.time),'seconds with k value:', k.clusters)

It works well.

0 讨论(0)

查看其它7个回答