Working with a phylogenetic tree in R, I would like to create a matrix which indicates if each branch of the tree (B1 to B8) is associated with each species (A to E), where 1s indicate that the branch is associated. (Shown below)
The R function which.edge() is useful for identifying the terminal branch for a species. but it doesn't identify ALL the branches associated with each species. What function could I use to identify all the branches in the tree that go from the root to the tip for each species?
Example Tree
library(ape)
ex.tree <- read.tree(text="(A:4,((B:1,C:1):2,(D:2,E:2):1):1);")
plot(ex.tree)
edgelabels() #shows branches 1-8
The is the matrix I would like to create (Species A-E as columns, Branches B1-B8 as rows), but with an easy function rather than by hand.
B1 <- c(1,0,0,0,0)
B2 <- c(0,1,1,1,1)
B3 <- c(0,1,1,0,0)
B4 <- c(0,1,0,0,0)
B5 <- c(0,0,1,0,0)
B6 <- c(0,0,0,1,1)
B7 <- c(0,0,0,1,0)
B8 <- c(0,0,0,0,1)
Mat <- rbind(B1,B2,B3,B4,B5,B6,B7,B8)
colnames(Mat) <- c("A","B","C","D","E")
Mat
For example, Branch B2 goes to species B-E, but not to species A. For Species E, branches B2, B6, B8 are present.
Which R function(s) would be best? Thanks in advance!
I am unaware of any built-in function that does this. I wrote a helper function that can calculate this from the edge data stored in the tree
object.
branchNodeAdjacency <- function(x) {
m <- matrix(0, ncol=nt, nrow=nrow(x$edge))
from <- x$edge[,1]
to <- x$edge[,2]
g <- seq_along(x$tip.label)
while (any(!is.na(g))) {
i <- match(g, to)
m[cbind(i, seq_along(i))] <- 1
g <- from[i]
}
rownames(m) <- paste0("B", seq.int(nrow(m)))
colnames(m) <- x$tip.label
m
}
branchNodeAdjacency(ex.tree)
# A B C D E
# B1 1 0 0 0 0
# B2 0 1 1 1 1
# B3 0 1 1 0 0
# B4 0 1 0 0 0
# B5 0 0 1 0 0
# B6 0 0 0 1 1
# B7 0 0 0 1 0
# B8 0 0 0 0 1
The idea is we keep track of which leaf node values are represented by each internal node.
来源:https://stackoverflow.com/questions/33618821/phylogenetic-tree-how-to-create-a-branch-by-species-matrix