selection of observations by combining criteria in R

坚强是说给别人听的谎言 提交于 2019-12-25 04:08:44

问题


This topic has probably been brought up and it is a quite simpe solution , i guess. However i couldnt make it up to now. Lets say i have a data.frame (called "data") which contains 10 individuals (id) on which i collected observations at 3 time points (T)

> data <- data.frame(id = rep(c(1:10), 3),
                     T  = gl(3, 10),
                     X  = sample(1:30),
                     Y  = sample(c("yes", "no"), 30, replace = TRUE),
                     Z  = sample(1:40, 30),
                     Z2 = rnorm(30, mean = 5, sd = 0.5))

    > head(data)
      id T  X   Y  Z       Z2
    1  1 1 10 yes 15 5.993605
    2  2 1 18  no 22 6.096566
    3  3 1  5  no 24 5.101393
    4  4 1 15 yes 18 4.944108
    5  5 1 23  no 34 4.634176
    6  6 1 13  no 27 5.576015

I would like to create a subset of this data.frame (an new data.frame called data2) by selecting only individuals that have "yes" (variable Y) for each of the three time points (variable T), that means Y="yes" for T=1 and T=2 and T=3.

I know that combining conditions can be achieved by using the "&" sign, and this can be used to relate conditions for the 3 time points. However, my problem is to write each condition for each time point : how to tell R that i want subjects for which Y="yes" at T="1" for example ?

Thank you very much in advance to all. Have a great day,

Denis


回答1:


You can do:

keep.ids <- tapply(data$Y, data$id, FUN = function(x)all(x == "yes"))
subset(data, keep.ids[factor(id)])

Or use the plyr package:

library(plyr)
ddply(data, "id", function(x) if(all(x$Y == "yes")) x else NULL)


来源:https://stackoverflow.com/questions/16761315/selection-of-observations-by-combining-criteria-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!