问题
I am trying to create an interactive Sankey diagram in R using the networkD3
package as described at http://christophergandrud.github.io/networkD3/#sankey. My data is in the format of Discrete State Sequences(DSS). 1 row represents 1 event sequence. NAs represent that the sequence has ended. Recreating a sample of the data in R:
x1 <- c('06002100', '06002001', '06001304', '06002100')
x2 <- c('06002100', '06002001', 'NA', 'NA')
x3 <- c('06001304', '06002100', '06002001', 'NA')
test <- as.data.frame(rbind(x1,x2,x3))
networkd3 package requires data in the json form as given by:
URL <- paste0("https://cdn.rawgit.com/christophergandrud/networkD3/","master/JSONdata/energy.json")
Casting the sample data above in the required format would give me (test.json
):
{"nodes":[
{"name":"06002100"},
{"name":"06002001"},
{"name":"06001304"}
],
"links":[
{"source":0,"target":1,"value":3},
{"source":1,"target":2,"value":1},
{"source":2,"target":0,"value":2}
]}
Once the data is in the above format, I can use the following code to plot the sankey network.
library(networkD3)
library(jsonlite)
Energy <- fromJSON(txt = 'test.json') # Load the data
result <- as.data.frame(Energy)
sankeyNetwork(Links = Energy$links, Nodes = Energy$nodes, Source = "source", Target = "target", Value = "value", NodeID = "name", fontSize = 12, nodeWidth = 30)
I want to transform the DSS data that I have to the format required by networkD3. Is there a direct way to do this?
networkD3 examples page mentions that I can use igraph
package to create network graph data that can be plotted with networkD3. Unfortunately I couldn't find good examples for that.
回答1:
What sankeyNetwork()
ultimately wants is a Links
and a Nodes
data frame. Assuming that in your DSS data each side by side pair of nodes defines a link from left to right, then each pair of contiguous columns of your data frame looks like part of a Links
data frame with a source
and target
column.
first, I fixed your code so that it makes real NA
s not strings of "NA"...
x1 <- c('06002100', '06002001', '06002425', '06009347', '06010001', '06010383', '06009348')
x2 <- c('06002100', '06040401', '06009347', '06039301', NA, NA, NA)
x3 <- c('06001304', '06002001', '06009346', '06002425', '06003303', NA, NA)
x4 <- c('06002100', '06040401', '06009347', '06039301', '06039302', '06032301', '06032301')
test <- as.data.frame(rbind(x1,x2,x3,x4))
extract a data frame for each set of contiguous columns in your data frame, bind them into one long Links
data frame, and omit rows that have NA's...
linklist <- lapply(1:(ncol(test) - 1), function(x) data.frame(source = test[[x]], target = test[[x+1]], stringsAsFactors = F))
links <- na.omit(do.call(rbind, linklist))
make a vector of all unique node names and make a Nodes
data frame out of it, build a Links
data frame based on the zero-indexed names in the Nodes
data frame, then plot it...
node_names <- factor(sort(unique(c(as.character(links$source),
as.character(links$target)))))
nodes <- data.frame(name = node_names)
links <- data.frame(source = match(links$source, node_names) - 1,
target = match(links$target, node_names) - 1,
value = 1)
library(networkD3)
sankeyNetwork(links, nodes, "source", "target", "value", "name")
来源:https://stackoverflow.com/questions/45282987/sankey-diagram-for-discrete-state-sequences-in-r-using-networkd3