How can I create a dendrogram in R using pre-clustered data created elsewhere?

别等时光非礼了梦想. 提交于 2019-12-12 18:15:17

问题


I have clustering code written in Java, from which I can create a nested tree structure, e.g. the following shows a tiny piece of the tree where the two "isRetired" objects were clustered in the first iteration, and this group was clustered with "setIsRequired" in the fifth iteration. The distances between the objects in the clusters are shown in parentheses.

  |+5 (dist. = 0.0438171125324851)
    |+1 (dist. = 2.220446049250313E-16)
      |-isRetired
      |-isRetired
    |-setIsRetired

I would prefer to present my results in a more traditional dendrogram style, and it looks like R has some nice capabilities, but because I know very little about R, I am unclear on how to take advantage of them.

Is it possible for me to write out a tree structure to a file from Java, and then, with a few lines of R code, produce a dendrogram? From the R program, I'd like to do something like:

  1. Read from a file into a data structure (an "hclust" object?)
  2. Convert the data structure into a dendrogram (using "as-dendrogram"?)
  3. Display the dendrogram using "plot"

I guess the question boils down to whether R provides an easy way of reading from a file and converting that string input into an (hclust) object. If so, what should the data in the input file look like?


回答1:


I think what you are looking for is phylog. You can print your tree in a file in Newick notation, parse that out and construct a phylog object which you can easily visualize. The end of the webpage gives an example of how to do this. You also might want to consider phylobase. Although you don't want the entire functionality provided by these packages, you can piggyback on the constructs they use to represent trees and their plotting capabilities.

EDIT: It looks like a similar question to yours has been asked before here providing a simpler solution. So basically the only thing you will have to code here is your Newick parser or a parser for any other representation you want to output from Java.




回答2:


The ape (Analysis of Phylogenetics and Evolution) package contains dendrogram drawing functionality, and it is capable of reading trees in Newick format. Because it is an optional package, you'll need to install it. It is theoretically easy to use, e.g. the following commands produce a dendrogram:

> library("ape")
> gcPhylo <- read.tree(file = "gc.tree")
> plot(gcPhylo, show.node.label = TRUE)

My main complaint thus far is that there is little diagnostic information when there is trouble with the syntax of the file containing the tree information in Newick format. I've had success reading these same files with other tools (which in some cases, may be because the tools are forgiving of certain faults in the syntax).

You can also produce a dendrogram using the phylog package as shown below.

> library(ade4)
> newickString <- system("cat gc.tree", intern = TRUE)
> gcPhylog <- newick2phylog(newickString)
> plot(gcPhylog, clabel.nodes=1)

Both can work with trees in Newick format and both have many plotting options.



来源:https://stackoverflow.com/questions/5957625/how-can-i-create-a-dendrogram-in-r-using-pre-clustered-data-created-elsewhere

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!