问题
I am working on families trees :
I have adapted Bob Horton's example based on sqldf https://www.r-bloggers.com/exploring-recursive-ctes-with-sqldf/
My data :
person father
Guillou Arthur NA
Cleach Marc NA
Guillou Eric Guillou Arthur
Guillou Jacques Guillou Arthur
Cleach Franck Cleach Marc
Cleach Leo Cleach Marc
Cleach Herbet Cleach Leo
Cleach Adele Cleach Herbet
Guillou Jean Guillou Eric
Guillou Alan Guillou Eric
My results, descendants ordered by levels of "Guillou Arthur" (top person without father) :
name parent_name level
Guillou Arthur NA 1
Guillou Eric Guillou Arthur 2
Guillou Jacques Guillou Arthur 2
Guillou Alan Guillou Eric 3
Guillou Jean Guillou Eric 3
You can built this table with recursive query with sqldf :
The data :
person <- c("Guillou Arthur",
"Cleach Marc",
"Guillou Eric",
"Guillou Jacques",
"Cleach Franck",
"Cleach Leo",
"Cleach Herbet",
"Cleach Adele",
"Guillou Jean",
"Guillou Alan" )
father <- c(NA, NA, "Guillou Arthur" , "Guillou Arthur", "Cleach Marc", "Cleach Marc", "Cleach Leo", "Cleach Herbet", "Guillou Eric", "Guillou Eric")
family <- data.frame(person, father)
Large to long format conversion :
library(tidyr)
long_family <- gather(family, parent, parent_name, -person)
long_family
Recursive query to find descendants of "Guillou Arthur" (top person without father) :
library(sqldf)
descendants_sql <- "
WITH RECURSIVE descendants (name, parent_name, level) AS (
SELECT person, parent_name, 1 FROM long_family
WHERE person = '%s'
AND parent = '%s'
UNION ALL
SELECT F.person, F.parent_name, D.level + 1
FROM descendants D
JOIN long_family F
ON F.parent_name = D.name)
SELECT * FROM descendants ORDER BY level, name
"
fam <- sqldf(sprintf(descendants_sql, 'Guillou Arthur', 'father'))
fam
My question :
How can I create a data.frame object including all families trees directly with R (and not sql).
Each tree starts with a patriarch (without father) like "Cleach Marc". (with R method or sqldf method)
回答1:
We build a recursive function to get the father line, from there everything is easy.
First we define the data with stringsAsFactors = FALSE
for smoother reformatting.
family <- data.frame(person, father,stringsAsFactors = FALSE)
the function
father_line <- function(x){
dad <- subset(family,person==x)$father
if(is.na(dad)) return(x)
c(x,father_line(dad))
}
father_line ("Guillou Alan")
# [1] "Guillou Alan" "Guillou Eric" "Guillou Arthur"
Use it to get level and other things
family$father_line <- lapply(family$person,father_line)
family$level <- lengths(family$father_line)
family$patriarch <- sapply(family$father_line,tail,1)
# person father father_line level patriarch
# 1 Guillou Arthur <NA> Guillou Arthur 1 Guillou Arthur
# 2 Cleach Marc <NA> Cleach Marc 1 Cleach Marc
# 3 Guillou Eric Guillou Arthur Guillou Eric, Guillou Arthur 2 Guillou Arthur
# 4 Guillou Jacques Guillou Arthur Guillou Jacques, Guillou Arthur 2 Guillou Arthur
# 5 Cleach Franck Cleach Marc Cleach Franck, Cleach Marc 2 Cleach Marc
# 6 Cleach Leo Cleach Marc Cleach Leo, Cleach Marc 2 Cleach Marc
# 7 Cleach Herbet Cleach Leo Cleach Herbet, Cleach Leo, Cleach Marc 3 Cleach Marc
# 8 Cleach Adele Cleach Herbet Cleach Adele, Cleach Herbet, Cleach Leo, Cleach Marc 4 Cleach Marc
# 9 Guillou Jean Guillou Eric Guillou Jean, Guillou Eric, Guillou Arthur 3 Guillou Arthur
# 10 Guillou Alan Guillou Eric Guillou Alan, Guillou Eric, Guillou Arthur 3 Guillou Arthur
For example to get stated expected output:
subset(family,patriarch == "Guillou Arthur",select=c(person,father,level))
# person father level
# 1 Guillou Arthur <NA> 1
# 3 Guillou Eric Guillou Arthur 2
# 4 Guillou Jacques Guillou Arthur 2
# 9 Guillou Jean Guillou Eric 3
# 10 Guillou Alan Guillou Eric 3
The tidyverse
way it would look like this:
library(tidyverse)
family %>%
mutate(family_line = map(person,father_line),
level = lengths(family_line),
patriarch = map(family_line,last)) %>%
filter(patriarch == "Guillou Arthur") %>%
select(person,father,level)
# person father level
# 1 Guillou Arthur <NA> 1
# 2 Guillou Eric Guillou Arthur 2
# 3 Guillou Jacques Guillou Arthur 2
# 4 Guillou Jean Guillou Eric 3
# 5 Guillou Alan Guillou Eric 3
回答2:
You can probably do this using graph tools. so using igraph
, you can get neighbours using ego
functions.
A quick sketch (which needs checking!)
library(igraph)
family[] = lapply(family, factor, levels=unique(unlist(family)))
g = graph_from_adjacency_matrix(table(family))
cg = connect.neighborhood(g, order=length(V(g)), mode="out")
cbind( V(cg)$name,
sapply(ego(g, mode="out", mindist=1), function(x) replace(names(x), length(names(x))==0, NA)),
ego_size(cg, mode="out") )[grep("Guillou", V(cg)$name),]
[,1] [,2] [,3]
[1,] "Guillou Arthur" NA "1"
[2,] "Guillou Eric" "Guillou Arthur" "2"
[3,] "Guillou Jacques" "Guillou Arthur" "2"
[4,] "Guillou Jean" "Guillou Eric" "3"
[5,] "Guillou Alan" "Guillou Eric" "3"
In fact maybe you dont need to create a neighbourhood graph and can get by with:
cbind( V(g)$name,
sapply(ego(g, mode="out", mindist=1), function(x) replace(names(x), length(names(x))==0, NA)),
ego_size(g, mode="out", order=length(V(g))) )[grep("Cleach", V(g)$name),]
来源:https://stackoverflow.com/questions/49607454/built-family-nested-tree-parent-children-relationship-in-r