问题
I am interested in visualizing pathways patients have based on a pre-specified list of events (e.g. diagnosis, surgery, treatment1, treatment2, death).
A test data set might look like this:
df <- structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"),
Event = structure(c(2L, 3L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L,
5L, 1L), .Label = c("death", "diagnosis", "surgery", "treatment1",
"treatment2"), class = "factor"), date = structure(c(14610,
14619, 16667, 14975, 14976, 14977, 15074, 15084, 15006, 15050,
15051, 15053), class = "Date")), .Names = c("ID", "Event",
"date"), row.names = c(NA, 12L), class = "data.frame")
> df
ID Event date
1 a diagnosis 2010-01-01
2 a surgery 2010-01-10
3 a death 2015-08-20
4 b diagnosis 2011-01-01
5 b surgery 2011-01-02
6 b treatment1 2011-01-03
7 b treatment2 2011-04-10
8 b death 2011-04-20
9 c diagnosis 2011-02-01
10 c surgery 2011-03-17
11 c treatment2 2011-03-18
12 c death 2011-03-20
The data have been ordered by ID and date.
What I am after is the following:
> result
ID parent child datediff
1 a diagnosis surgery 9
2 a surgery death 1950
3 b diagnosis surgery 1
4 b surgery treatment1 1
5 b treatment1 treatment2 90
6 b treatment2 death 10
7 c diagnosis surgery 45
8 c surgery treatment2 1
9 c treatment2 death 2
(Note that the numbers in the datediff column are not actual) i.e. a series of parent-child nodes with the difference in dates between them.
This will allow me to plot the nodes, and do some further descriptive analysis on time between events.
I found a package to plot nodes (see below), however, if someone knows a way/package that allows the arrow width to reflect the number of parent-child combinations, that would be awesome!
require(igraph) # possible package to use
parents<-c("A","A","A","A","A","A","C","C","F","F","H","I")
children<-c("I","I","I","I","B","A","D","H","G","H","I","J")
begats<-data.frame(parents=parents,children=children)
graph_begats<-graph.data.frame(begats)
tkplot(graph_begats)
Cheers, Luc
回答1:
Collapse your data up to give each parent-child combo and a count of how many times they occurred, e.g.:
# put the previous event against the current event, and drop the rows before the first event:
df$Event <- as.character(df$Event)
df$PreEvent <- with(df, ave(Event,ID,FUN=function(x) c(NA,head(x,-1)) ) )
result <- df[!is.na(df$PreEvent),c("ID","PreEvent","Event")]
# aggregate the combos by how often they occur:
result <- aggregate(list(count=rownames(result)),result[c("PreEvent","Event")],FUN=length)
# PreEvent Event count
#1 surgery death 1
#2 treatment2 death 2
#3 diagnosis surgery 3
#4 surgery treatment1 1
#5 surgery treatment2 1
#6 treatment1 treatment2 1
# plot in igraph, adjusting the edge.width to account for how many cases of each
# parent-child combo exist:
library(igraph)
g <- graph.data.frame(result)
plot(g,edge.width=result$count)
来源:https://stackoverflow.com/questions/32365575/pathways-manipulate-list-of-events-in-parent-child-nodes-in-r