问题
I have a list of vectors created by running:
import hcluster
import numpy as np
from ete2 import Tree
vecs = [np.array(i) for i in document_list]
where document_list is a collection of web documents I am analysing. I then perform hierarchical clustering:
Z = hcluster.linkage(vecs, metric='cosine')
This generates an ndarray such as:
[[ 12. 19. 0. 1. ]
[ 15. 21. 0. 3. ]
[ 18. 22. 0. 4. ]
[ 3. 16. 0. 7. ]
[ 8. 23. 0. 6. ]
[ 5. 27. 0. 6. ]
[ 1. 28. 0. 7. ]
[ 0. 21. 0. 2. ]
[ 5. 29. 0.18350472 2. ]
[ 2. 10. 0.18350472 3. ]
[ 47. 30. 0.29289577 9. ]
[ 13. 28. 0.29289577 13. ]
[ 73. 32. 0.29289577 18. ]
[ 26. 12. 0.42264521 5. ]
[ 5. 33. 0.42264521 12. ]
[ 14. 35. 0.42264521 12. ]
[ 19. 35. 0.42264521 18. ]
[ 4. 20. 0.31174826 3. ]
[ 34. 21. 0.5 19. ]
[ 38. 29. 0.31174826 21. ]]
Is it possible to convert this ndarray into a newick string that can be passed to the ete2 Tree() constructor so that I can draw and manipulate a newick tree using the tools provided by ete2?
Does it even make sense to try and do this and if not is there another way that I can generate a tree/dendrogram using the same data and ete2 (I realise that there are other packages that can draw dendrograms such as dendropy and hcluster itself but would prefer to use ete2 all the same)?
Thanks!
回答1:
I use the following approach for pretty much the same thing:
from hcluster import linkage, to_tree
from ete2 import Tree
#hcluster part
Y = dist_matrix(items, dist_fn)
Z = linkage(Y, "single")
T = to_tree(Z)
#ete2 section
root = Tree()
root.dist = 0
root.name = "root"
item2node = {T: root}
to_visit = [T]
while to_visit:
node = to_visit.pop()
cl_dist = node.dist /2.0
for ch_node in [node.left, node.right]:
if ch_node:
ch = Tree()
ch.dist = cl_dist
ch.name = str(ch_node.id)
item2node[node].add_child(ch)
item2node[ch_node] = ch
to_visit.append(ch_node)
# This is your ETE tree structure
tree = root
回答2:
Update:
from hcluster import linkage, to_tree
from ete2 import Tree
#hcluster part
Y = dist_matrix(items, dist_fn)
Z = linkage(Y, "single")
R,T = to_tree( mat, rd=True )
#print "ROOT", R, "TREE", T
root = Tree()
root.dist = 0
root.name = 'root'
item2node = {R.get_id(): root}
to_visit = T
while to_visit:
node = to_visit.pop()
#print "NODE", node
cl_dist = node.dist / 2.0
for ch_node in [node.get_left(), node.get_right()]:
if ch_node:
ch_node_id = ch_node.get_id()
ch_node_name = str(ch_node_id)
ch = Tree()
ch.dist = cl_dist
ch.name = ch_node_name
if nodeNames:
if ch_node_id < len(nodeNames):
ch.name = nodeNames[ ch_node_id ]
item2node[ch_node_id] = ch
item2node[ch_node_id].add_child(ch)
to_visit.append(ch_node)
来源:https://stackoverflow.com/questions/9364609/converting-ndarray-generated-by-hcluster-into-a-newick-string-for-use-with-ete2