R cut dendrogram into groups with minimum size

穿精又带淫゛_ 提交于 2019-12-03 08:17:31

Thanks to @Vlo and @lukeA I'm able to implement a loop. However, I am just posting this for a starting point and certainly open to a more elegant solution.

unnest <- function(x) { # from Vlo's answer
  if(is.null(names(x))) x
  else c(list(all=unname(unlist(x))), do.call(c, lapply(x, unnest)))

cuts <- hc$height + 1e-9

min_size <- 10
smallest <- 0
i <- 0

while(smallest < min_size & i <= length(cuts)){
  h_i <- cuts[i <- i+1]
  if(i > length(cuts)){
    warning("Couldn't find a cluster big enough.")
  else  smallest <- 
                  lapply(X = unnest(cut(as.dendrogram(hc), h=h_i)$lower), 
                         FUN = attr, which = "members") ) # from lukeA's comment
h_i # returns desired output: [1] 3.79211

This feature is available in the dendextend package with the heights_per_k.dendrogram function (which also has a faster C++ implementation when loading the dendextendRcpp function).

## Not run: 
hc <- hclust(dist(USArrests[1:4,]), "ave")
dend <- as.dendrogram(hc)
##       1        2        3        4
##86.47086 68.84745 45.98871 28.36531

As a sidenote, the dendextend package has a cutree.dendrogram S3 method for dendrograms (which works very similarly to cutree for hclust objects).


This doesn't answer the question, but might be useful for members extraction if you decide to loop through the h.

Stealing and modifying some code from here

# Unnest the list/dendogram structure
unnest <- function(x) {
  if(is.null(names(x))) {
  else {
    c(list(all=unname(unlist(x))), do.call(c, lapply(x, unnest)))

# Extract the `members` attribute from each dendogram
lapply(X = unnest(cut(as.dendrogram(hc), h=3.8)), FUN = attr, which = "members")


# Please don't ask me why there are 2 dendograms stored
# in the `$upper` list while `print` displays one

[1] 2

[1] 2

[1] 66

[1] 11

[1] 24

[1] 49