R: igraph, community detection, edge.betweenness method, count/list members of each community?

后端未结

关注

 2  1950

I\'ve a relatively large graph with Vertices: 524 Edges: 1125, of real world transactions. The edges are directed and have a weight (inclusion is optional). I\'m trying investig

相关标签:

2条回答

一生所求

2021-02-02 03:03

In regard to "how to control the number of communities" in OPs question, I use the cut_at function on the communities to cut the resulting hierarchical structure into a desired number of groups. I hope someone can confirm that I am doing something sane. Namely, consider the following:

#Generate graph
adj.mat<- matrix(,nrow=200, ncol=200) #empty matrix
set.seed(2) 

##populate adjacency matrix
for(i in 1:200){adj.mat[i,sample(rep(1:200), runif(1,1,100))]<-1}
adj.mat[which(is.na(adj.mat))] <-0

for(i in 1:200){
  adj.mat[i,i]<-0
}

G<-graph.adjacency(adj.mat, mode='undirected')
plot(G, vertex.label=NA)

##Find clusters
walktrap.comms<- cluster_walktrap(G, steps=10)
max(walktrap.comms$membership) #43

  [1]  6 34 13  1 19 19  3  9 20 29 12 26  9 28  9  9  2 14 13 14 27  9 33 17 22 23 23 10 17 31  9 21  2  1
 [35] 33 23  3 26 22 29  4 16 24 22 25 31 23 23 13 30 35 27 25 15  6 14  9  2 16  7 23  4 18 10 10 22 27 27
 [69] 23 31 27 32 36  8 23  6 23 14 19 22 19 37 27  6 27 22  9 14  4 22 14 32 33 27 26 14 21 27 22 12 20  7
[103] 14 26 38 39 26  3 14 23 22 14 40  9  5 19 29 31 26 26  2 19  6  9  1  9 23  4 14 11  9 22 23 41 10 27
[137] 22 18 26 14  8 15 27 10  5 33 21 28 23 22 13  1 22 24 14 18  8  2 18  1 27 12 22 34 13 27  3  5 27 25
[171]  1 27 13 34  8 10 13  5 17 17 25  6 19 42 31 13 30 32 15 30  5 11  9 25  6 33 18 33 43 10

Now, note that there are 43 groups but we want coarser cuts hence, examine the dendrogram:

plot(as.hclust(walktrap.comms), label=F)

And cut based on it. I arbitrarily chose 6 cuts but nevertheless, you now have coarser clusters

cut_at(walktrap.comms, no=6)

  [1] 4 2 5 4 5 5 3 5 3 4 3 5 5 3 5 5 3 1 5 1 1 5 1 6 1 1 1 4 6 5 5 2 3 4 1 1 3 5 1 4 6 6 3 1 5 5 1 1 5 4 3 1
 [53] 5 2 4 1 5 3 6 3 1 6 6 4 4 1 1 1 1 5 1 4 3 3 1 4 1 1 5 1 5 2 1 4 1 1 5 1 6 1 1 4 1 1 5 1 2 1 1 3 3 3 1 5
[105] 3 3 5 3 1 1 1 1 3 5 2 5 4 5 5 5 3 5 4 5 4 5 1 6 1 3 5 1 1 1 4 1 1 6 5 1 3 2 1 4 2 1 2 3 1 1 5 4 1 3 1 6
[157] 3 3 6 4 1 3 1 2 5 1 3 2 1 5 4 1 5 2 3 4 5 2 6 6 5 4 5 3 5 5 4 4 2 4 2 3 5 5 4 1 6 1 2 4

0 讨论(0)

被撕碎了的回忆

2021-02-02 03:19
A couple of these questions can be discovered by closely looking at the documentation of the functions you're using. For instance, the documentation of clusters, in the "Values" section, describes what will be returned from the function, a couple of which answer your questions. Documentation aside, you can always use the str function to analyze the make-up of any particular object.

That being said, to get the members or numbers of members in a particular community, you can look at the membership object returned by the clusters function (which you're already using to assign color). So something like:
```
summary(clusters(all2)$membership)
```
would describe the IDs of the clusters that are being used. In the case of your sample data, it looks like you have clusters with the IDs ranging from 0 to 585, for 586 clusters in total. (Note that you won't be able to display those very accurately using the coloring scheme you're currently using.)

To determine the number of vertices in each cluster, you can look at the csize component also returned by clusters. In this case, it's a vector of length 586, storing one size for each cluster calculated. So you can use
```
clusters(all2)$csize
```
to get the list of sizes of your clusters. Be warned that your clusterIDs, as previously mentioned, start from 0 ("zero-indexed") whereas R vectors start from 1 ("one-indexed"), so you'll need to shift these indices by one. For instance, clusters(all2)$csize[5] returns the size of the cluster with the ID of 4.

To list the vertices in any cluster, you just want to find which IDs in the membership component previously mentioned match up to the cluster in question. So if I want to find the vertices in cluster #128 (there are 21 of these, according to clusters(all2)$csize[129]), I could use:
```
which(clusters(all2)$membership == 128)
length(which(clusters(all2)$membership == 128)) #21
```
and to retrieve the vertices in that cluster, I can use the V function and pass in the indices which I just computed which are a member of that cluster:
```
> V(all2)[clusters(all2)$membership == 128]
Vertex sequence:
 [1] "625591221 - Clare Clancy"           
 [2] "100000283016052 - Podge Mooney"     
 [3] "100000036003966 - Jennifer Cleary"  
 [4] "100000248002190 - Sarah Dowd"       
 [5] "100001269231766 - LirChild Surfwear"
 [6] "100000112732723 - Stephen Howard"   
 [7] "100000136545396 - Ciaran O Hanlon"  
 [8] "1666181940 - Evion Grizewald"       
 [9] "100000079324233 - Johanna Delaney"  
[10] "100000097126561 - Órlaith Murphy"   
[11] "100000130390840 - Julieann Evans"   
[12] "100000216769732 - Steffan Ashe"     
[13] "100000245018012 - Tom Feehan"       
[14] "100000004970313 - Rob Sheahan"      
[15] "1841747558 - Laura Comber"          
[16] "1846686377 - Karen Ni Fhailliun"    
[17] "100000312579635 - Anne Rutherford"  
[18] "100000572764945 - Lit Đ Jsociety"   
[19] "100003033618584 - Fall Ball"        
[20] "100000293776067 - James O'Sullivan" 
[21] "100000104657411 - David Conway"
```
That would cover the basic igraph questions you had. The other questions are more graph-theory related. I don't know of a way to supervise the number of clusters to be created using iGraph, but someone may be able to point you to a package which is able to do that. You may have more success posting that as a separate question, either here or in another venue.

Regarding your first points of wanting to iterate through all possible communities, I think you'll find that to be unfeasible for a graph of significant size. The number of possible arrangements of the membership vector for 5 different clusters would be 5^n, where n is the size of the graph. If you want to find "all possible communities", that number will actually be O(n^n), if my mental math is correct. Essentially, it would be impossible to calculate that exhaustively over any reasonably size network, even given massive computational resources. So I think you'll be better off using some sort of intelligence/optimization for determining the number of communities represented in your graph, as the clusters function does.
0 讨论(0)
发布评论:

提交评论
- 加载中...