R: Understanding Graph

时光怂恿深爱的人放手 提交于 2020-12-27 05:58:09

问题


I am using the R programming language and the "igraph" library. I am trying to better understand graph structures for "two mode" graphs (graphs in which there are two types of nodes). In particular, I am trying to understand how to "project" two mode" (to my understanding, these are usually "bipartite") graphs. (https://rpubs.com/pjmurphy/317838)

For instance, I created a graph of relationships between "men" and "women". Although this graph has two modes (men and women), I don't think that this graph is bipartite (since "edges" can exist between the same types of nodes:

library(igraph)

# I don't think this is a bipartite graph
gender_data <- data.frame(
    
    "men" = c("john", "kevin", "mark", "kevin", "kevin", "mark", "henry", "mark", "susan", "john", "henry", "susan", "susan", "janet", "janet", "henry", "henry", "john"),
    "women" = c("janet", "janet", "sarah", "lucy", "lucy", "susan", "janet", "susan", "lucy", "kevin", "lucy", "janet", "kevin", "mark", "lucy", "sarah", "mark", "mark")
)

#create directed graph 
graph <- graph.data.frame(gender_data, directed=F)
graph <- simplify(graph)

V(graph)["john"]$color<-"red"
V(graph)["kevin"]$color<-"red"
V(graph)["mark"]$color<-"red"
V(graph)["janet"]$color<-"blue"
V(graph)["sarah"]$color<-"blue"
V(graph)["lucy"]$color<-"blue"
V(graph)["henry"]$color<-"red"
V(graph)["susan"]$color<-"blue"

plot(graph)

I read that a better way to understand bipartite graphs is through "actors and movies". Different actors can be in the same movie and one actor can be in different movies - but as such an actor can't share an edge with itself and a movie can not share an edge with itself. Here is my interpretation of such a network:

film_data <- data.frame(
    
    "movie" = c("movie_1", "movie_1", "movie_1", "movie_2", "movie_2", "movie_2", "movie_3", "movie_3", "movie_3", "movie_4", "movie_4", "movie_4", "movie_4", "movie_5", "movie_5", "movie_5", "movie_6", "movie_6"),
    "actor" = c("actor_1", "actor_2", "actor_3", "actor_2", "actor_3", "actor_4", "actor_1", "actor_5", "actor_6", "actor_2", "actor_7", "actor_1", "actor_8", "actor_5", "actor_9", "actor_3", "actor_2", "actor_8")
)

#create directed graph 
graph <- graph.data.frame(film_data, directed=F)
graph <- simplify(graph)
plot(graph)

However, (according to this stackoverflow post here: valued bipartite projection using R igraph ), this actor graph is still not bipartite (I don't understand why):

is.bipartite(graph)
[1] FALSE

According to the same stackoverflow post, the actor graph can still be converted into a bipartite graph (I don't understand what just happened):

V(graph)$type <- V(graph)$name %in% film_data[,1]
is.bipartite(graph)
[1] TRUE

From here, a projection can be made that "projects" two separate graphs:

proj<-bipartite.projection(graph, V(graph)$type,multiplicity = TRUE)
proj

$proj1
IGRAPH b5bc5ca UNW- 9 16 -- 
+ attr: name (v/c), weight (e/n)
+ edges from b5bc5ca (vertex names):
 [1] actor_1--actor_2 actor_1--actor_3 actor_1--actor_5 actor_1--actor_6 actor_1--actor_7 actor_1--actor_8 actor_2--actor_3 actor_2--actor_4
 [9] actor_2--actor_7 actor_2--actor_8 actor_3--actor_4 actor_3--actor_5 actor_3--actor_9 actor_5--actor_6 actor_5--actor_9 actor_7--actor_8

$proj2
IGRAPH b5bc5ca UNW- 6 11 -- 
+ attr: name (v/c), weight (e/n)
+ edges from b5bc5ca (vertex names):
 [1] movie_1--movie_3 movie_1--movie_4 movie_1--movie_2 movie_1--movie_6 movie_1--movie_5 movie_2--movie_4 movie_2--movie_6 movie_2--movie_5
 [9] movie_3--movie_4 movie_3--movie_5 movie_4--movie_6

Finally, the two projections can be plotted:

plot(proj$proj1)
plot(proj$proj2)

My questions:

  1. Why wasn't the original actor-movie graph "bipartite"? After all, it was undirected and cyclic .

  2. Why does the line V(graph)$type <- V(graph)$name %in% film_data[,1] transform the actor-movie graph into a bipartite graph?

  3. Is there any reason that

    is.bipartite(proj$proj1) 1 FALSE

    is.bipartite(proj$proj2) 1 FALSE

  4. How does this line proj<-bipartite.projection(graph, V(graph)$type,multiplicity = TRUE) "work"? In the original actor-movie graph, I specifically entered the data so that there are no direct relationships between two movies or two actors. For instance, in "proj2" there is a edge between "movie_1" and "movie_2" - how did this happen and why did this happen? In my original data, there is no such direct relationship between movie_1 and movie_2?

  5. Suppose actor_1, actor_2, actor_3, actor_4 are male and actor_5, actor_6, actor_7, actor_8, actor_9 are female. Is there a way to now make 3 projections? Projection for male actors, projection for female actors and projections for movies?

Thanks


回答1:


In addition to your actors and movies-analogy, I would like to add that an actor can only be connected to 0 or more movies, never to other actors. And movies can only be connected to 0 or more actors. Now, for the questions:

A1.

When the output of a certain function doesn't match your expectation, it is often helpful to look at the help page for that function. This command will explain the first question:

?is.bipartite

Bipartite graphs have a type vertex attribute in igraph, this is boolean and FALSE for the vertices of the first kind and TRUE for vertices of the second kind.

[...]

is_bipartite checks whether the graph is bipartite or not. It just checks whether the graph has a vertex attribute called type.

So, is_bipartite doesn't consider the original actor-movie graph to be bipartite, because the graph doesn't have a vertex attribute called type. There simply is no information in graph that tells it which set each vertex belongs to. We'll add this information in the next question:

A2.

Here we'll look at the example you already found, and I'll try to explain it. Let's first check the help page again:

?V

Create a vertex sequence (vs) containing all vertices of a graph. [...]

This function V() creates a sequence of vertices from a graph. V(graph) will list all vertices in graph. We want V(graph)$type to contain the essential attribute type.

As explained in the first help-page, V(graph)$type needs to contain a TRUE/FALSE values for each vertex in graph, which is what is done in this code:

V(graph)$type <- V(graph)$name %in% film_data[,1]

V(graph)$name is a vector that contains the values of all vertices. film_data[,1] is a vector that contains the values of all primary vertices (vertices of the first group). View these two R to study their contents and you'll see what I mean.

Finally, the %in% operator checks, for each item on the left, if it exists in the vector on the right. If so, it returns TRUE. If not, it returns FALSE. In this case it will return a vector with TRUE for each of the secondary vertices (e.g. actors), and a FALSE for those in the primary group (e.g. movies).

The complete construct V(graph)$name %in% film_data[,1] thus creates a vector of TRUEs and FALSEs, where a TRUE indicates that a vertex belongs to the secondary group. And as the help page said, we can make our graph bipartite by simply storing this information in V(graph)$type.

A3.

If we look at V(proj$proj1)$type, like what we did in A1, we see that proj$proj1 doesn't contain the type attribute. Again doesn't know whether its vertices are primary or secondary group - this information got lost when running bipartite.projection(). But this time it's not necessary; we know it's not a bipartite graph because it only contains one set.

You can optionally choose retain this information with the option remove.type = F in bipartite.projection().

A4.

The bipartite projection shows which actors share the same movies, and which movies share the same actors.

Example: In your example data, we can see Actor 6 is connected with only Movie 3. Movie 3 is also connected with Actors 1 and 5. The bipartite projection will show Actor 6 connected with only Actors 1 and 5.

A5.

Here I will design a graph with two sets (actors and movies), of which actors have an extra attribute (male) to specify their gender.

The way you constructed a graph before did not specify the relationships between these items. I don't know this method and I don't think it is a proper way to specify a graph. There are multiple ways to create a graph mentioned in the igraph manual pages. I will demonstrate one close to your method:

items <- data.frame(
    name  = c("actorM1","actorM2","actorM3","actorF1","actorF2","actorF3","actorF4","movie1","movie2","movie3"),
    movie = c(F,F,F,F,F,F,F,T,T,T),
    male  = c(T,T,T,F,F,F,F,NA,NA,NA)
)
items

relations <- data.frame(
    a = c("actorM1","actorM1","actorM2","actorM3","actorM3","actorM3","actorF1","actorF2","actorF2","actorF3","actorF3","actorF3","actorF4"),
    b = c("movie1", "movie2", "movie3", "movie1", "movie2", "movie3", "movie2", "movie2", "movie3", "movie1", "movie2", "movie3", "movie3")
)
relations

graph <- graph_from_data_frame(relations, directed=F, vertices=items)
graph
plot(graph)

Above I have created two dataframes:

  • items contains an entry for each item (5 actors and 3 movies) along with their characteristics (are they a movie, are they male), and
  • relations lists how they are connected.

I then merged these two into a graph with graph_from_data_frame().

You'll remember the next step: I assign sets based on the value of $movie. Then I plot the movies. Don't plot the actors yet, because we still need to split men from women.

actors_movies <- bipartite.projection(graph, types = V(graph)$movie, remove.type = F)
plot(actors_movies$proj2)

I couldn't find a nicer solution to split up this group than this: remove all women from the projection to plot all men, then remove all men from the projection to plot all women. Note that the information $male is still available thanks to the option remove.type = F in bipartite.projection().

male = delete_vertices(actors_movies$proj1, V(actors_movies$proj1)$male == F)
plot(male)

female = delete_vertices(actors_movies$proj1, V(actors_movies$proj1)$male == T)
plot(female)

I hope this is helpful for you. At least I enjoyed learning about igraph.




回答2:


Great answer Caspar V.! I just have one comment :

Suppose we "sabotage" the actor-movie graph by making one of the actors connected to another actor (actor_2 and actor_3):

film_data <- data.frame(
    
    "movie" = c("movie_1", "movie_1", "movie_1", "movie_2", "movie_2", "movie_2", "movie_3", "movie_3", "movie_3", "movie_4", "movie_4", "movie_4", "movie_4", "movie_5", "movie_5", "movie_5", "movie_6", "movie_6", "actor_2"),
    "actor" = c("actor_1", "actor_2", "actor_3", "actor_2", "actor_3", "actor_4", "actor_1", "actor_5", "actor_6", "actor_2", "actor_7", "actor_1", "actor_8", "actor_5", "actor_9", "actor_3", "actor_2", "actor_8", "actor_3")
)

#create directed graph 
graph <- graph.data.frame(film_data, directed=F)
graph <- simplify(graph)
plot(graph)

As far as I understand, now this graph is not bipartite.

However, If we use the code you provided:

V(graph)$type <- V(graph)$name %in% film_data[,1]

is.bipartite(graph)

This returns a value of "TRUE".

Can you please provide your opinion on this? Is this new modified graph bipartite or not bipartite?

Thank you!



来源:https://stackoverflow.com/questions/65243162/r-understanding-graph

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!