How to color a dendrogram's labels according to defined groups? (in R)

后端 未结 3 1113
情深已故
情深已故 2021-01-13 03:46

I have a numeric matrix in R with 24 rows and 10,000 columns. The row names of this matrix are basically file names from which I have read the data corresponding to each of

相关标签:
3条回答
  • 2021-01-13 04:05

    I suspect the function you are looking for is either color_labels or get_leaves_branches_col. The first color your labels based on cutree (like color_branches do) and the second allows you to get the colors of the branch of each leaf, and then use it to color the labels of the tree (if you use unusual methods for coloring the branches (as happens when using branches_attr_by_labels). For example:

    # define dendrogram object to play with:
    hc <- hclust(dist(USArrests[1:5,]), "ave")
    dend <- as.dendrogram(hc)
    
    library(dendextend)
    par(mfrow = c(1,2), mar = c(5,2,1,0))
    dend <- dend %>%
             color_branches(k = 3) %>%
             set("branches_lwd", c(2,1,2)) %>%
             set("branches_lty", c(1,2,1))
    
    plot(dend)
    
    dend <- color_labels(dend, k = 3)
    # The same as:
    # labels_colors(dend)  <- get_leaves_branches_col(dend)
    plot(dend)
    

    enter image description here

    Either way, you should always have a look at the set function, for ideas on what can be done to your dendrogram (this saves the hassle of remembering all the different functions names).

    0 讨论(0)
  • 2021-01-13 04:07

    You may try this solution, only change 'labs' with your 'MS.groups' and 'var' with your 'MS.groups' converted to numeric (maybe, with as.numeric). It comes from How to colour the labels of a dendrogram by an additional factor variable in R

    ## The data
    df <- structure(list(labs = c("a1", "a2", "a3", "a4", "a5", "a6", "a7", 
    "a8", "b1", "b2", "b3", "b4", "b5", "b6", "b7"), var = c(1L, 1L, 2L,     
    1L,2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L), td = c(13.1, 14.5, 16.7, 
    12.9, 14.9, 15.6, 13.4, 15.3, 12.8, 14.5, 14.7, 13.1, 14.9, 15.6, 14.6), 
    fd = c(2L, 3L, 3L, 1L, 2L, 3L, 2L, 3L, 2L, 4L, 2L, 1L, 4L, 3L, 3L)), 
    .Names = c("labs", "var", "td", "fd"), class = "data.frame", row.names = 
    c(NA, -15L))
    
    ## Subset for clustering
    df.nw = df[,3:4]
    
    # Assign the labs column to a vector
    labs = df$labs
    
    d = dist(as.matrix(df.nw))                          # find distance matrix 
    hc = hclust(d, method="complete")                   # apply hierarchical clustering 
    
    ## plot the dendrogram
    
    plot(hc, hang=-0.01, cex=0.6, labels=labs, xlab="") 
    
    ## convert hclust to dendrogram 
    hcd = as.dendrogram(hc)                             
    
    ## plot using dendrogram object
    plot(hcd, cex=0.6)                                  
    
    Var = df$var                                        # factor variable for colours
    varCol = gsub("1","red",Var)                        # convert numbers to colours
    varCol = gsub("2","blue",varCol)
    
    # colour-code dendrogram branches by a factor 
    
    # ... your code
    colLab <- function(n) {
      if(is.leaf(n)) {
        a <- attributes(n)
        attr(n, "label") <- labs[a$label]
        attr(n, "nodePar") <- c(a$nodePar, lab.col = varCol[a$label]) 
      }
      n
    }
    
    ## Coloured plot
    plot(dendrapply(hcd, colLab))
    
    0 讨论(0)
  • 2021-01-13 04:31

    You may take a look at this tutorial, which displays several solutions for visualizing dendograms in R by groups

    https://rstudio-pubs-static.s3.amazonaws.com/1876_df0bf890dd54461f98719b461d987c3d.html

    However, I think the best solution, suit for your data, is offered by the package 'dendextend'. See the tutorial (the example concerning the 'iris' dataset, which is similar to your problem): https://nycdatascience.com/wp-content/uploads/2013/09/dendextend-tutorial.pdf

    See also the vignette: http://cran.r-project.org/web/packages/dendextend/vignettes/Cluster_Analysis.html

    0 讨论(0)
提交回复
热议问题