问题
This topic has probably been brought up and it is a quite simpe solution , i guess. However i couldnt make it up to now. Lets say i have a data.frame (called "data") which contains 10 individuals (id) on which i collected observations at 3 time points (T)
> data <- data.frame(id = rep(c(1:10), 3),
T = gl(3, 10),
X = sample(1:30),
Y = sample(c("yes", "no"), 30, replace = TRUE),
Z = sample(1:40, 30),
Z2 = rnorm(30, mean = 5, sd = 0.5))
> head(data)
id T X Y Z Z2
1 1 1 10 yes 15 5.993605
2 2 1 18 no 22 6.096566
3 3 1 5 no 24 5.101393
4 4 1 15 yes 18 4.944108
5 5 1 23 no 34 4.634176
6 6 1 13 no 27 5.576015
I would like to create a subset of this data.frame (an new data.frame called data2) by selecting only individuals that have "yes" (variable Y) for each of the three time points (variable T), that means Y="yes" for T=1 and T=2 and T=3.
I know that combining conditions can be achieved by using the "&" sign, and this can be used to relate conditions for the 3 time points. However, my problem is to write each condition for each time point : how to tell R that i want subjects for which Y="yes" at T="1" for example ?
Thank you very much in advance to all. Have a great day,
Denis
回答1:
You can do:
keep.ids <- tapply(data$Y, data$id, FUN = function(x)all(x == "yes"))
subset(data, keep.ids[factor(id)])
Or use the plyr
package:
library(plyr)
ddply(data, "id", function(x) if(all(x$Y == "yes")) x else NULL)
来源:https://stackoverflow.com/questions/16761315/selection-of-observations-by-combining-criteria-in-r