I have a SQL table that maps, say, authors and books. I would like to group linked authors and books (books written by the same author, and authors who co-wrote a book) toge
A couple of suggestions
aubk[,list(author_list = list(sort(author_id))), by = book_id]
will give a list of author groups
The followingwill create a unique identifier for each group of authors and then return a list with
for each unique group of authors
aubk[, list(author_list = list(sort(author_id)),
group_id = paste0(sort(author_id), collapse=','),
n_authors = .N),by = book_id][,
list(n_books = .N,
n_authors = unique(n_authors),
book_list = list(book_id),
book_ids = paste0(book_id, collapse = ', ')) ,by = group_id]
If the author order matters, just remove the sort
with the definitions of author_list
and group_id
noting that the above, while useful does not do the appropriate grouping
Perhaps the following will
# the unique groups of authors by book
unique_authors <- aubk[, list(sort(author_id)), by = book_id]
# some helper functions
# a filter function that allows arguments to be passed
.Filter <- function (f, x,...)
{
ind <- as.logical(sapply(x, f,...))
x[!is.na(ind) & ind]
}
# any(x in y)?
`%%in%%` <- function(x,table){any(unlist(x) %in% table)}
# function to filter a list and return the unique elements from
# flattened values
FilterList <- function(.list, table) {
unique(unlist(.Filter(`%%in%%`, .list, table =table)))
}
# all the authors
all_authors <- unique(unlist(unique_authors))
# with names!
setattr(all_authors, 'names', all_authors)
# get for each author, the authors with whom they have
# collaborated in at least 1 book
lapply(all_authors, FilterList, .list = unique_authors)