问题
I have a ragged data frame with each row as an occurrence in time of one or more entities, like so:
(time1) entitya entityf entityz
(time2) entityg entityh
(time3) entityo entityp entityk entityL
(time4) entityM
I want to create an edge list for network analysis from a subset of entities found in a second vector (nodelist). My problem is that I don't know:
1). How to subset only the entities in the nodelist. I was considering
datanew<- subset(dataold, dataold %in% nodelist)
but it doesn't work.
2). How to make ragged data frame into a two column edge list. In the above example, it would transform to:
entitya entityf
entitya entityz
entityz entityf
...
NO idea how to do this. Any help is really appreciated!
回答1:
Try this:
# read your data
dat <- strsplit(readLines(textConnection("(time1) entitya entityf entityz
(time2) entityg entityh
(time3) entityo entityp entityk entityL
(time4) entityM")), " ")
# remove (time)
dat <- lapply(dat, `[`, -1)
# filter
nodelist <- c("entitya", "entityf", "entityz", "entityg", "entityh",
"entityo", "entityp", "entityk")
dat <- lapply(dat, intersect, nodelist)
# create an edge matrix
t(do.call(cbind, lapply(dat[sapply(dat, length) >= 2], combn, 2)))
This last step might be a lot to digest, so here is a breakout:
sapply(dat, length)
computes the lengths of your list elementsdat[... >= 2]
only keeps the list elements with at least two itemslapply(..., combn, 2)
creates all combinations: a list of wide matricesdo.call(cbind, ...)
binds all the combinations into a wide matrixt(...)
transposes into a tall matrix
来源:https://stackoverflow.com/questions/13782132/create-edge-list-from-ragged-data-frame-in-r-for-network-analysis