I want to produce a graph that shows a correlation between clustered data and similarity matrix. How can I do this in R? Is there any function in R that creates the graph like a
The general solutions suggested in the comments by @Chase and @bill_080 need a little bit of enhancement to (partially) fulfil the needs of the OP.
A reproducible example:
require(MASS)
set.seed(1)
dat <- data.frame(mvrnorm(100, mu = c(2,6,3),
Sigma = matrix(c(10, 2, 4,
2, 3, 0.5,
4, 0.5, 2), ncol = 3)))
Compute the dissimilarity matrix of the standardised data using Eucildean distances
dij <- dist(scale(dat, center = TRUE, scale = TRUE))
and then calculate a hierarchical clustering of these data using the group average method
clust <- hclust(dij, method = "average")
Next we compute the ordering of the samples on basis of forming 3 ('k') groups from the dendrogram, but we could have chosen something else here.
ord <- order(cutree(clust, k = 3))
Next compute the dissimilarities between samples based on dendrogram, the cophenetic distances:
coph <- cophenetic(clust)
Here are 3 image plots of:
Here is the code that produces the above plots
layout(matrix(1:4, ncol = 2))
image(as.matrix(dij)[ord, ord], main = "Original distances")
image(as.matrix(coph)[ord, ord], main = "Cophenetic distances")
image((as.matrix(coph) - as.matrix(dij))[ord, ord],
main = "Cophenetic - Original")
plot(coph ~ dij, ylab = "Cophenetic distances", xlab = "Original distances",
main = "Shepard Plot")
abline(0,1, col = "red")
box()
layout(1)
Which produces this on the active device:
Having said all that, however, only the Shepard plot shows the "correlation between clustered data and [dis]similarity matrix", and that is not an image plot (levelplot). How would you propose to compute the correlation between two numbers for all pairwise comparisons of cophenetic and original [dis]similarities?