I have a R dataset x as below:
ID Month
1 1 Jan
2 3 Jan
3 4 Jan
4 6 Jan
5 6 Jan
6 9 Jan
7 2 Feb
8 4 Feb
9 6 Feb
10 8
First, split the df$ID
by Month
and use intersect
to find elements common in each sub-group.
Reduce(intersect, split(df$ID, df$Month))
#[1] 4 6
If you want to subset the corresponding data.frame, do
df[df$ID %in% Reduce(intersect, split(df$ID, df$Month)),]
We can use data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'ID', get the row index (.I
) where the number of unique 'Months' are equal to the number of unique 'Months' in the whole dataset and subset the data based on this
library(data.table)
setDT(df1)[df1[, .I[uniqueN(Month) == uniqueN(df1$Month)], ID]$V1]
# ID Month
# 1: 4 Jan
# 2: 4 Feb
# 3: 4 Mar
# 4: 4 Apr
# 5: 4 May
# 6: 4 Jun
# 7: 6 Jan
# 8: 6 Jan
# 9: 6 Feb
#10: 6 Mar
#11: 6 Apr
#12: 6 May
#13: 6 Jun
To extract the 'ID's
setDT(df1)[, ID[uniqueN(Month) == uniqueN(df1$Month)], ID]$V1
#[1] 4 6
Or with base R
1) Using table
with rowSums
v1 <- rowSums(table(df1) > 0)
names(v1)[v1==max(v1)]
#[1] "4" "6"
This info can be used for subsetting the data
subset(df1, ID %in% names(v1)[v1 == max(v1)])
2) Using tapply
lst <- with(df1, tapply(Month, ID, FUN = unique))
names(which(lengths(lst) == length(unique(df1$Month))))
#[1] "4" "6"
Or using dplyr
library(dplyr)
df1 %>%
group_by(ID) %>%
filter(n_distinct(Month)== n_distinct(df1$Month)) %>%
.$ID %>%
unique
#[1] 4 6
or if we need to get the rows
df1 %>%
group_by(ID) %>%
filter(n_distinct(Month)== n_distinct(df1$Month))
# A tibble: 13 x 2
# Groups: ID [2]
# ID Month
# <int> <chr>
# 1 4 Jan
# 2 6 Jan
# 3 6 Jan
# 4 4 Feb
# 5 6 Feb
# 6 4 Mar
# 7 6 Mar
# 8 4 Apr
# 9 6 Apr
#10 4 May
#11 6 May
#12 4 Jun
#13 6 Jun