Subset data frame to include only levels of one factor that have values in both levels of another factor

我与影子孤独终老i 提交于 2019-12-06 23:43:35

Here is one option with data.table

library(data.table)
setDT(d)[, .SD[all(c("juvenile", "adult") %in% age)], ID]

Or a base R option with ave

d[with(d, ave(as.character(age), ID, FUN = function(x) length(unique(x)))>1),]
#   ID      age       size
#1  a1 juvenile -1.4545407
#2  a2 juvenile -0.4695317
#3  a3 juvenile  0.2271316
#5  a1 juvenile  0.2961210
#6  a2    adult -0.8331993
#9  a1    adult -0.6924967
#10 a3    adult -0.4619550

With dplyr, you can use group_by %>% filter:

library(dplyr)
d %>% group_by(ID) %>% filter(all(c("juvenile", "adult") %in% age))

# A tibble: 7 x 3
# Groups:   ID [3]
#      ID      age       size
#  <fctr>   <fctr>      <dbl>
#1     a1 juvenile -0.6947697
#2     a2 juvenile -0.3665272
#3     a3 juvenile  1.0293555
#4     a1 juvenile  0.2745224
#5     a2    adult  0.5299029
#6     a1    adult  2.2247802
#7     a3    adult -0.4717160

split by age, intersect and subset:

d[d$ID %in% Reduce(intersect, split(d$ID, d$age)),]
#   ID      age        size
#1  a1 juvenile  1.44761836
#2  a2 juvenile  1.70098645
#3  a3 juvenile  0.08231986
#5  a1 juvenile  0.91240568
#6  a2    adult -1.77318962
#9  a1    adult  0.13597986
#10 a3    adult -1.18575294
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!