Convert a data frame to a treeNetwork compatible list

怎甘沉沦 提交于 2019-12-09 16:18:05

问题


Consider the following data frame:

   Country     Provinces          City Zone
1   Canada   Newfondland      St Johns    A
2   Canada           PEI Charlottetown    B
3   Canada   Nova Scotia       Halifax    C
4   Canada New Brunswick   Fredericton    D
5   Canada        Quebec            NA   NA
6   Canada        Quebec   Quebec City   NA
7   Canada       Ontario       Toronto    A
8   Canada       Ontario        Ottawa    B
9   Canada      Manitoba      Winnipeg    C
10  Canada  Saskatchewan        Regina    D

Would there be a clever way to convert it to a treeNetwork compatible list (from the networkD3 package) in the form of:

CanadaPC <- list(name = "Canada",
                 children = list(
                   list(name = "Newfoundland",
                        children = list(list(name = "St. John's",
                                             children = list(list(name = "A"))))),
                   list(name = "PEI",
                        children = list(list(name = "Charlottetown",
                                             children = list(list(name = "B"))))),
                   list(name = "Nova Scotia",
                        children = list(list(name = "Halifax",
                                             children = list(list(name = "C"))))),
                   list(name = "New Brunswick",
                        children = list(list(name = "Fredericton",
                                             children = list(list(name = "D"))))),
                   list(name = "Quebec",
                        children = list(list(name = "Quebec City"))),
                   list(name = "Ontario",
                        children = list(list(name = "Toronto",
                                             children = list(list(name = "A"))),
                                        list(name = "Ottawa",
                                             children = list(list(name = "B"))))),
                   list(name = "Manitoba",
                        children = list(list(name = "Winnipeg",
                                             children = list(list(name = "C"))))),
                   list(name = "Saskatchewan",
                        children = list(list(name = "Regina",
                                             children = list(list(name = "D")))))))

In order to plot a Reingold-Tilford tree that would have an arbitrary set of levels:

I have tried several sub-optimal routines including a messy combination of for loops but I can't get this in the desired format.

Ideally, the function would scale in order to consider the first column as the root (starting point) and the other columns would be different levels of children.


Edit

A similar question was asked on the same topic and @MrFlick provided an interesting recursive function. The original data frame had a fixed set of levels. I introduced NAs to add another level of complexity (arbitrary set of levels) that is not adressed in @MrFlick initial solution.


Data

structure(list(Country = c("Canada", "Canada", "Canada", "Canada", 
"Canada", "Canada", "Canada", "Canada", "Canada", "Canada"), 
    Provinces = c("Newfondland", "PEI", "Nova Scotia", "New Brunswick", 
    "Quebec", "Quebec", "Ontario", "Ontario", "Manitoba", "Saskatchewan"
    ), City = c("St Johns", "Charlottetown", "Halifax", "Fredericton", 
    NA, "Quebec City", "Toronto", "Ottawa", "Winnipeg", "Regina"
    ), Zone = c("A", "B", "C", "D", NA, NA, "A", "B", "C", 
    "D")), class = "data.frame", row.names = c(NA, -10L), .Names = c("Country", 
"Provinces", "City", "Zone"))

回答1:


A better strategy for this scenario may be a recursive split() Here's such an implementation. First, here's the sample data

dd<-structure(list(Country = c("Canada", "Canada", "Canada", "Canada", 
"Canada", "Canada", "Canada", "Canada", "Canada", "Canada"), 
    Provinces = c("Newfondland", "PEI", "Nova Scotia", "New Brunswick", 
    "Quebec", "Quebec", "Ontario", "Ontario", "Manitoba", "Saskatchewan"
    ), City = c("St Johns", "Charlottetown", "Halifax", "Fredericton", 
    NA, "Quebec City", "Toronto", "Ottawa", "Winnipeg", "Regina"
    ), Zone = c("A", "B", "C", "D", NA, NA, "A", "B", "C", 
    "D")), class = "data.frame", row.names = c(NA, -10L), .Names = c("Country", 
"Provinces", "City", "Zone"))

note that' i've replaced the "NA" strings with true NA values. Now, the function

rsplit <- function(x) {
    x <- x[!is.na(x[,1]),,drop=FALSE]
    if(nrow(x)==0) return(NULL)
    if(ncol(x)==1) return(lapply(x[,1], function(v) list(name=v)))
    s <- split(x[,-1, drop=FALSE], x[,1])
    unname(mapply(function(v,n) {if(!is.null(v)) list(name=n, children=v) else list(name=n)}, lapply(s, rsplit), names(s), SIMPLIFY=FALSE))
}

Then we can run

rsplit(dd)

It seems to work with the test data. The only difference is the order in which the children nodes are arranged.



来源:https://stackoverflow.com/questions/30734572/convert-a-data-frame-to-a-treenetwork-compatible-list

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!