In R, how can I plot a similarity matrix (like a block graph) after clustering data?

前端 未结 1 474
死守一世寂寞
死守一世寂寞 2021-02-04 21:42

I want to produce a graph that shows a correlation between clustered data and similarity matrix. How can I do this in R? Is there any function in R that creates the graph like a

1条回答
  •  一个人的身影
    2021-02-04 22:24

    The general solutions suggested in the comments by @Chase and @bill_080 need a little bit of enhancement to (partially) fulfil the needs of the OP.

    A reproducible example:

    require(MASS)
    set.seed(1)
    dat <- data.frame(mvrnorm(100, mu = c(2,6,3), 
                              Sigma = matrix(c(10,   2,   4,
                                                2,   3, 0.5,
                                                4, 0.5,   2), ncol = 3)))
    

    Compute the dissimilarity matrix of the standardised data using Eucildean distances

    dij <- dist(scale(dat, center = TRUE, scale = TRUE))
    

    and then calculate a hierarchical clustering of these data using the group average method

    clust <- hclust(dij, method = "average")
    

    Next we compute the ordering of the samples on basis of forming 3 ('k') groups from the dendrogram, but we could have chosen something else here.

    ord <- order(cutree(clust, k = 3))
    

    Next compute the dissimilarities between samples based on dendrogram, the cophenetic distances:

    coph <- cophenetic(clust)
    

    Here are 3 image plots of:

    1. The original dissimilarity matrix, sorted on basis of cluster analysis groupings,
    2. The cophenetic distances, again sorted as above
    3. The difference between the original dissimilarities and the cophenetic distances
    4. A Shepard-like plot comparing the original and cophenetic distances; the better the clustering at capturing the original distances the closer to the 1:1 line the points will lie

    Here is the code that produces the above plots

    layout(matrix(1:4, ncol = 2))
    image(as.matrix(dij)[ord, ord], main = "Original distances")
    image(as.matrix(coph)[ord, ord], main = "Cophenetic distances")
    image((as.matrix(coph) - as.matrix(dij))[ord, ord], 
          main = "Cophenetic - Original")
    plot(coph ~ dij, ylab = "Cophenetic distances", xlab = "Original distances",
         main = "Shepard Plot")
    abline(0,1, col = "red")
    box()
    layout(1)
    

    Which produces this on the active device:

    plots of original and cophenetic distances

    Having said all that, however, only the Shepard plot shows the "correlation between clustered data and [dis]similarity matrix", and that is not an image plot (levelplot). How would you propose to compute the correlation between two numbers for all pairwise comparisons of cophenetic and original [dis]similarities?

    0 讨论(0)
提交回复
热议问题