问题
I have a large graph (several, actually) in igraph—on the order of 100,000 vertices—and each vertex has an attribute which is either true
or false
. For each vertex, I would like to count how many of the vertices directly connected to it have the attribute. My current solution is the following function, which takes as its argument a graph.
attrcount <- function(g) {
nb <- neighborhood(g,order=1)
return(sapply(nb,function(x) {sum(V(g)$attr[x]}))
}
This returns a vector of counts which is off by 1 for vertices which have the attribute, but I can adjust this easily.
The problem is that this runs incredibly slowly, and it seems like there should be a fast way to do this, since, for instance, computing the degree of each vertex is practically instantaneous with degree(g)
.
Am I doing this a stupid way?
As an example, suppose this was our graph.
set.seed(42)
g <- erdos.renyi.game(169081, 178058, type="gnm")
V(g)$att <- as.logical(rbinom(vcount(g), 1, 0.5))
回答1:
Use get.adjlist
to query all adjacent vertices, and then sapply
(or tapply
might be even faster) on this list to get the counts. It is also worth storing the attribute in a vector, because then you don't need to extract it all the time.
With sapply
system.time({
al <- get.adjlist(g)
att <- V(g)$att
res <- sapply(al, function(x) sum(att[x]))
})
# user system elapsed
# 0.571 0.005 0.576
With tapply
system.time({
al <- get.adjlist(g)
alv <- unlist(al)
alf <- factor(rep(seq_along(al), sapply(al, length)),
levels=seq_along(al))
att <- V(g)$att
res2 <- tapply(att[alv], alf, sum)
res2[is.na(res2)] <- 0
})
# user system elapsed
# 1.121 0.020 1.144
all(res == res2)
# TRUE
Somewhat a surprise to me, but the tapply
solution is actually slower.
If this is still not enough, then I guess you can still make it faster by writing it in C/C++.
回答2:
For faster computation, use get.adjacency
to pull the adjacency matrix, then multiply the matrix by the attribute vector using %*%
:
library(igraph)
set.seed(42)
g <- erdos.renyi.game(1000, 1000, type = "gnm")
V(g)$att <- as.logical(rbinom(vcount(g), 1, 0.5))
system.time({
ma <- get.adjacency(g)
att <- V(g)$att
res1 <- as.numeric(ma %*% att)
})
# user system elapsed
# 0.003 0.000 0.003
Compared to using get.adjlist
and sapply
:
system.time({
al <- get.adjlist(g)
att <- V(g)$att
res2 <- sapply(al, function(x) sum(att[x]))
})
# user system elapsed
# 9.733 0.243 10.107
After modifying the class of res1
, the results vector is identical:
res1 <- as.numeric(res1)
identical(res1, res2)
# [1] TRUE
来源:https://stackoverflow.com/questions/23155365/count-how-many-vertices-in-a-vertexs-neighbourhood-have-an-attribute-in-igraph