问题
I have a list of data indicating attendance to conferences like this:
Event Participant
ConferenceA John
ConferenceA Joe
ConferenceA Mary
ConferenceB John
ConferenceB Ted
ConferenceC Jessica
I would like to create a binary indicator attendance matrix of the following format:
Event John Joe Mary Ted Jessica
ConferenceA 1 1 1 0 0
ConferenceB 1 0 0 1 0
ConferenceC 0 0 0 0 1
Is there a way to do this in R?
回答1:
Assuming your data.frame
is called "mydf", simply use table
:
> table(mydf)
Participant
Event Jessica Joe John Mary Ted
ConferenceA 0 1 1 1 0
ConferenceB 0 0 1 0 1
ConferenceC 1 0 0 0 0
If there is a chance that someone would have attended a conference more than once, leading table
to return a value greater than 1, you can simply recode all values greater than 1 to 1, like this.
temp <- table(mydf)
temp[temp > 1] <- 1
Note that this returns a table
. If you want a data.frame
to be returned, use as.data.frame.matrix
:
> as.data.frame.matrix(table(mydf))
Jessica Joe John Mary Ted
ConferenceA 0 1 1 1 0
ConferenceB 0 0 1 0 1
ConferenceC 1 0 0 0 0
In the above, "mydf" is defined as:
mydf <- structure(list(Event = c("ConferenceA", "ConferenceA",
"ConferenceA", "ConferenceB", "ConferenceB", "ConferenceC"),
Participant = c("John", "Joe", "Mary", "John", "Ted", "Jessica")),
.Names = c("Event", "Participant"), class = "data.frame",
row.names = c(NA, -6L))
Please share your data in a similar manner in the future.
回答2:
@Ananda's answer is way better but I thought I'd throw up another approach using qdap. It only shines in the instance where "someone would have attended a conference more than once".
I included an instance when "someone would have attended a conference more than once" as pointed out by Ananda. In this case using the adjmat
function and pulling out the Boolean matrix could be helpful.
Data With Double Attendee:
## dat <- read.table(text="Event Participant
## ConferenceA John
## ConferenceA Joe
## ConferenceA Mary
## ConferenceB John
## ConferenceB Ted
## ConferenceB Ted
## ConferenceC Jessica ", header=TRUE)
A table of counts:
library(qdap)
wfm(dat[, 1], dat[, 2], lower.case = FALSE)
## > wfm(dat[, 1], dat[, 2], lower.case = FALSE)
## Jessica Joe John Mary Ted
## conferenceA 0 1 1 1 0
## conferenceB 0 0 1 0 2
## conferenceC 1 0 0 0 0
With mtabulate
with(dat, mtabulate(split(Participant, Event)))
## Jessica Joe John Mary Ted
## ConferenceA 0 1 1 1 0
## ConferenceB 0 0 1 0 2
## ConferenceC 1 0 0 0 0
A Boolean matrix:
adjmat(wfm(dat[, 1], dat[, 2], lower.case = FALSE))$boolean
## > adjmat(wfm(dat[, 1], dat[, 2], lower.case = FALSE))$boolean
## Jessica Joe John Mary Ted
## conferenceA 0 1 1 1 0
## conferenceB 0 0 1 0 1
## conferenceC 1 0 0 0 0
来源:https://stackoverflow.com/questions/17431524/create-a-binary-indicator-matrix-boolean-matrix-in-r