dendrogram

How to put labels on the edges in the Dendrogram example?

江枫思渺然 提交于 2019-12-03 12:02:52
Given a tree diagram like the Dendrogram example ( source ), how would one put labels on the edges? The Javascript code to draw the edges looks like the next lines: var link = vis.selectAll("path.link") .data(cluster.links(nodes)) .enter().append("path") .attr("class", "link") .attr("d", diagonal); Mike Bostock, the author of D3, very graciously helped with the following solution. Define a style for g.link; I just copied the style for g.node. Then I replaced the "var link =...." code with the following. The x and y functions place the label in the center of the path. var linkg = vis.selectAll(

Converting ndarray generated by hcluster into a Newick string for use with ete2 package

大憨熊 提交于 2019-12-03 10:09:06
问题 I have a list of vectors created by running: import hcluster import numpy as np from ete2 import Tree vecs = [np.array(i) for i in document_list] where document_list is a collection of web documents I am analysing. I then perform hierarchical clustering: Z = hcluster.linkage(vecs, metric='cosine') This generates an ndarray such as: [[ 12. 19. 0. 1. ] [ 15. 21. 0. 3. ] [ 18. 22. 0. 4. ] [ 3. 16. 0. 7. ] [ 8. 23. 0. 6. ] [ 5. 27. 0. 6. ] [ 1. 28. 0. 7. ] [ 0. 21. 0. 2. ] [ 5. 29. 0.18350472 2.

Label and color leaf dendrogram (phylogeny) in R using ape package

自作多情 提交于 2019-12-03 09:55:51
Following a previous post ( Label and color leaf dendrogram in r ) I have a follow-up question. My questions are similar to the post mentioned but I wonder can it be done using ape (e.g., plot(as.phylo(fit), type="fan", labelCol) as it has more type of phylogeny. The mentioned post questions were: How can I show the group codes in leaf label (instead of the sample number)? I wish to assign a color to each code group and colored the leaf label according to it (it might happen that they will not be in the same clade and by that I can find more information)? And the code sample is: sample = data

R cut dendrogram into groups with minimum size

穿精又带淫゛_ 提交于 2019-12-03 08:17:31
Is there an easy way to calculate lowest value of h in cut that produces groupings of a given minimum size? In this example, if I wanted clusters with at least ten members each, I should go with h = 3.80 : # using iris data simply for reproducible example data(iris) d <- data.frame(scale(iris[,1:4])) hc <- hclust(dist(d)) plot(hc) cut(as.dendrogram(hc), h=3.79) # produces 5 groups; group 4 has 7 members cut(as.dendrogram(hc), h=3.80) # produces 4 groups; no group has <10 members Since the heights of the splits are given in hc$height , I could create a set of candidate values using hc$height +

Extract labels membership / classification from a cut dendrogram in R (i.e.: a cutree function for dendrogram)

你离开我真会死。 提交于 2019-12-03 06:01:19
问题 I'm trying to extract a classification from a dendrogram in R that I've cut at a certain height. This is easy to do with cutree on an hclust object, but I can't figure out how to do it on a dendrogram object. Further, I can't just use my clusters from the original hclust, becuase (frustratingly), the numbering of the classes from cutree is different from the numbering of classes with cut . hc <- hclust(dist(USArrests), "ave") classification<-cutree(hc,h=70) dend1 <- as.dendrogram(hc) dend2 <-

How to build a dendrogram from a directory tree?

六眼飞鱼酱① 提交于 2019-12-03 05:59:05
问题 Given a root absolute directory path. How do I generate a dendrogram object of all path's below it so that I can visualize the directory tree with R? Suppose the following call returned the following leaf nodes. list.files(path, full.names = TRUE, recursive = TRUE ) root/a/some/file.R root/a/another/file.R root/a/another/cool/file.R root/b/some/data.csv root/b/more/data.csv I'd like to make a plot in R like the output of the unix tree program: root ├── a │ ├── another │ │ ├── cool │ │ │ └──

cluster presentation dendrogram alternative in r

寵の児 提交于 2019-12-03 05:19:49
问题 I know dendrograms are quite popular. However if there are quite large number of observations and classes it hard to follow. However sometime I feel that there should be better way to present the same thing. I got an idea but do not know how to implement it. Consider the following dendrogram. > data(mtcars) > plot(hclust(dist(mtcars))) Can plot it like a scatter plot. In which the distance between two points is plotted with line, while sperate clusters (assumed threshold) are colored and

Extracting clusters from seaborn clustermap

风流意气都作罢 提交于 2019-12-03 02:29:28
问题 I am using the seaborn clustermap to create clusters and visually it works great (this example produces very similar results). However I am having trouble figuring out how to programmatically extract the clusters. For instance, in the example link, how could I find out that 1-1 rh, 1-1 lh, 5-1 rh, 5-1 lh make a good cluster? Visually it's easy. I am trying to use methods of looking through the data, and dendrograms but I'm having little success EDIT Code from example: import pandas as pd

How to plot dendrograms with large datasets?

假如想象 提交于 2019-12-03 02:22:03
问题 I am using ape (Analysis of Phylogenetics and Evolution) package in R that has dendrogram drawing functionality. I use following commands to read the data in Newick format, and draw a dendrogram using the plot function: library("ape") gcPhylo <-read.tree(file = "gc.tree") plot(gcPhylo, show.node.label = TRUE) As the data set is quite large, it is impossible to see any details in the lower levels of the tree. I can see just black areas but no details. I can only see few levels from the top,

cluster presentation dendrogram alternative in r

浪尽此生 提交于 2019-12-02 18:36:54
I know dendrograms are quite popular. However if there are quite large number of observations and classes it hard to follow. However sometime I feel that there should be better way to present the same thing. I got an idea but do not know how to implement it. Consider the following dendrogram. > data(mtcars) > plot(hclust(dist(mtcars))) Can plot it like a scatter plot. In which the distance between two points is plotted with line, while sperate clusters (assumed threshold) are colored and circle size is determined by value of some variable. You are describing a fairly typical way of going about