问题
Say I have a dataset where rows are classes people took:
attendance <- data.frame(id = c(1, 1, 1, 2, 2),
class = c("Math", "English", "Math", "Reading", "Math"))
I.e.,
id class
1 1 "Math"
2 1 "English"
3 1 "Math"
4 2 "Reading"
5 2 "Math"
And I want to create a new dataset where rows are ids and the variables are class names, like this:
class.names <- names(table(attendance$class))
attedance2 <- matrix(nrow=length(table(attendance$id)),
ncol=length(class.names))
colnames(attedance2) <- class.names
attedance2 <- as.data.frame(attedance2)
attedance2$id <- unique(attendance$id)
I.e.,
English Math Reading id
1 NA NA NA 1
2 NA NA NA 2
I want to fill in the NAs with whether that particular id took that class or not. It can be Yes/No, 1/0, or counts of the classes
I.e.,
English Math Reading id
1 "Yes" "Yes" "No" 1
2 "No" "Yes" "Yes" 2
I'm familiar with dplyr, so it'd be easier for me if that was used in the solution but not necessary. Thank you for your help!
回答1:
Using:
library(reshape2)
attendance$val <- 'yes'
dcast(unique(attendance), id ~ class, value.var = 'val', fill = 'no')
gives:
id English Math Reading 1 1 yes yes no 2 2 no yes yes
A similar approach with data.table
:
library(data.table)
dcast(unique(setDT(attendance))[,val:='yes'], id ~ class, value.var = 'val', fill = 'no')
Or with dplyr
/tidyr
:
library(dplyr)
library(tidyr)
attendance %>%
distinct() %>%
mutate(var = 'yes') %>%
spread(class, var, fill = 'no')
Another, somewhat more convoluted option might to reshape first and then replace the counts with yes
and no
(see here for an explanation about the default aggregate option of dcast
):
att2 <- dcast(attendance, id ~ class, value.var = 'class')
which gives:
id English Math Reading 1 1 1 2 0 2 2 0 1 1
Now you can replace the count with:
# create index which counts are above zero
idx <- att2[,-1] > 0
# replace the non-zero values with 'yes'
att2[,-1][idx] <- 'yes'
# replace the zero values with 'no'
att2[,-1][!idx] <- 'no'
which finally gives:
> att2 id English Math Reading 1 1 yes yes no 2 2 no yes yes
回答2:
We can do this with base R
attendance$val <- "yes"
d1 <- reshape(attendance, idvar = 'id', direction = 'wide', timevar = 'class')
d1[is.na(d1)] <- "no"
names(d1) <- sub("val\\.", '', names(d1))
d1
# id Math English Reading
#1 1 yes yes no
#4 2 yes no yes
Or with xtabs
xtabs(val ~id + class, transform(unique(attendance), val = 1))
# class
# id English Math Reading
# 1 1 1 0
# 2 0 1 1
NOTE: The binary can be easily converted to 'yes', 'no', but it is better to have either 1/0 or TRUE/FALSE
来源:https://stackoverflow.com/questions/44488605/turn-long-dataset-of-classes-taken-into-wide-dataset-where-variables-are-dummy-c