Being a newbie in R, I\'m not very sure how to choose the best number of clusters to do a k-means analysis. After plotting a subset of below data, how many clusters will be
In order to determine optimal k-cluster in clustering methods. I usually using Elbow
method accompany by Parallel processing to avoid time-comsuming. This code can sample like this:
Elbow method
elbow.k <- function(mydata){
dist.obj <- dist(mydata)
hclust.obj <- hclust(dist.obj)
css.obj <- css.hclust(dist.obj,hclust.obj)
elbow.obj <- elbow.batch(css.obj)
k <- elbow.obj$k
return(k)
}
Running Elbow parallel
no_cores <- detectCores()
cl<-makeCluster(no_cores)
clusterEvalQ(cl, library(GMD))
clusterExport(cl, list("data.clustering", "data.convert", "elbow.k", "clustering.kmeans"))
start.time <- Sys.time()
elbow.k.handle(data.clustering))
k.clusters <- parSapply(cl, 1, function(x) elbow.k(data.clustering))
end.time <- Sys.time()
cat('Time to find k using Elbow method is',(end.time - start.time),'seconds with k value:', k.clusters)
It works well.