Cluster analysis in R: determine the optimal number of clusters

后端 未结 7 1926
星月不相逢
星月不相逢 2020-11-22 10:28

Being a newbie in R, I\'m not very sure how to choose the best number of clusters to do a k-means analysis. After plotting a subset of below data, how many clusters will be

7条回答
  •  情歌与酒
    2020-11-22 10:44

    In order to determine optimal k-cluster in clustering methods. I usually using Elbow method accompany by Parallel processing to avoid time-comsuming. This code can sample like this:

    Elbow method

    elbow.k <- function(mydata){
    dist.obj <- dist(mydata)
    hclust.obj <- hclust(dist.obj)
    css.obj <- css.hclust(dist.obj,hclust.obj)
    elbow.obj <- elbow.batch(css.obj)
    k <- elbow.obj$k
    return(k)
    }
    

    Running Elbow parallel

    no_cores <- detectCores()
        cl<-makeCluster(no_cores)
        clusterEvalQ(cl, library(GMD))
        clusterExport(cl, list("data.clustering", "data.convert", "elbow.k", "clustering.kmeans"))
     start.time <- Sys.time()
     elbow.k.handle(data.clustering))
     k.clusters <- parSapply(cl, 1, function(x) elbow.k(data.clustering))
        end.time <- Sys.time()
        cat('Time to find k using Elbow method is',(end.time - start.time),'seconds with k value:', k.clusters)
    

    It works well.

提交回复
热议问题