Subset data frame to include only levels of one factor that have values in both levels of another factor

心已入冬 提交于 2019-12-10 11:44:03

问题


I am working with a data frame that deals with numeric measurements. Some individuals have been measured several times, both as juveniles and adults. A reproducible example:

ID <- c("a1", "a2", "a3", "a4", "a1", "a2", "a5", "a6", "a1", "a3")
age <- rep(c("juvenile", "adult"), each=5)
size <- rnorm(10)

# e.g. a1 is measured 3 times, twice as a juvenile, once as an adult.
d <- data.frame(ID, age, size)

My goal is to subset that data frame by selecting the IDs that appear at least once as a juvenile and at least once as an adult. Not sure how to do that..?

The resulting dataframe would contain all measurements for individuals a1, a2 and a3, but would exclude a4, a5 and a6, as they were not measured at both stages.

A similar question was asked 7 months ago but never had an answer (Subset data frame to include only levels one factor that have values in both levels of another factor)

Thanks!


回答1:


Here is one option with data.table

library(data.table)
setDT(d)[, .SD[all(c("juvenile", "adult") %in% age)], ID]

Or a base R option with ave

d[with(d, ave(as.character(age), ID, FUN = function(x) length(unique(x)))>1),]
#   ID      age       size
#1  a1 juvenile -1.4545407
#2  a2 juvenile -0.4695317
#3  a3 juvenile  0.2271316
#5  a1 juvenile  0.2961210
#6  a2    adult -0.8331993
#9  a1    adult -0.6924967
#10 a3    adult -0.4619550



回答2:


With dplyr, you can use group_by %>% filter:

library(dplyr)
d %>% group_by(ID) %>% filter(all(c("juvenile", "adult") %in% age))

# A tibble: 7 x 3
# Groups:   ID [3]
#      ID      age       size
#  <fctr>   <fctr>      <dbl>
#1     a1 juvenile -0.6947697
#2     a2 juvenile -0.3665272
#3     a3 juvenile  1.0293555
#4     a1 juvenile  0.2745224
#5     a2    adult  0.5299029
#6     a1    adult  2.2247802
#7     a3    adult -0.4717160



回答3:


split by age, intersect and subset:

d[d$ID %in% Reduce(intersect, split(d$ID, d$age)),]
#   ID      age        size
#1  a1 juvenile  1.44761836
#2  a2 juvenile  1.70098645
#3  a3 juvenile  0.08231986
#5  a1 juvenile  0.91240568
#6  a2    adult -1.77318962
#9  a1    adult  0.13597986
#10 a3    adult -1.18575294


来源:https://stackoverflow.com/questions/40557147/subset-data-frame-to-include-only-levels-one-factor-that-have-values-in-both-lev

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!