I have a SQL table that maps, say, authors and books. I would like to group linked authors and books (books written by the same author, and authors who co-wrote a book) toge
Here's a go re-hashing my answer to an old question of mine that Josh O'Brien linked in the comments ( identify groups of linked episodes which chain together ). This answer uses the igraph
library.
# Dummy data that might be easier to interpret to show it worked
# Authors 1,2 and 3,4 should group. author 5 is a group to themselves
aubk <- data.frame(author_id=c(1,2,3,4,5),book_id=c(1,1,2,2,5))
# identify authors with a bit of leading text to prevent clashes
# with the book ids
aubk$author_id2 <- paste0("au",aubk$author_id)
library(igraph)
#create a graph - this needs to be matrix input
au_graph <- graph.edgelist(as.matrix(aubk[c("author_id2","book_id")]))
# get the ids of the authors
result <- data.frame(author_id=names(au_graph[1]),stringsAsFactors=FALSE)
# get the corresponding group membership of the authors
result$group <- clusters(au_graph)$membership
# subset to only the authors data
result <- result[substr(result$author_id,1,2)=="au",]
# make the author_id variable numeric again
result$author_id <- as.numeric(substr(result$author_id,3,nchar(result$author_id)))
> result
author_id group
1 1 1
3 2 1
4 3 2
6 4 2
7 5 3