Extract elements common in all column groups

前端 未结 2 1755
悲哀的现实
悲哀的现实 2020-11-28 16:21

I have a R dataset x as below:

  ID Month
1   1   Jan
2   3   Jan
3   4   Jan
4   6   Jan
5   6   Jan
6   9   Jan
7   2   Feb
8   4   Feb
9   6   Feb
10  8           


        
相关标签:
2条回答
  • 2020-11-28 16:41

    First, split the df$ID by Month and use intersect to find elements common in each sub-group.

    Reduce(intersect, split(df$ID, df$Month))
    #[1] 4 6
    

    If you want to subset the corresponding data.frame, do

    df[df$ID %in% Reduce(intersect, split(df$ID, df$Month)),]
    
    0 讨论(0)
  • 2020-11-28 16:57

    We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'ID', get the row index (.I) where the number of unique 'Months' are equal to the number of unique 'Months' in the whole dataset and subset the data based on this

    library(data.table)
    setDT(df1)[df1[, .I[uniqueN(Month) == uniqueN(df1$Month)], ID]$V1]
    #    ID Month
    # 1:  4   Jan
    # 2:  4   Feb
    # 3:  4   Mar
    # 4:  4   Apr
    # 5:  4   May
    # 6:  4   Jun
    # 7:  6   Jan
    # 8:  6   Jan
    # 9:  6   Feb
    #10:  6   Mar
    #11:  6   Apr
    #12:  6   May
    #13:  6   Jun
    

    To extract the 'ID's

    setDT(df1)[, ID[uniqueN(Month) == uniqueN(df1$Month)], ID]$V1
    #[1] 4 6
    

    Or with base R

    1) Using table with rowSums

    v1 <- rowSums(table(df1) > 0)
    names(v1)[v1==max(v1)]
    #[1] "4" "6"
    

    This info can be used for subsetting the data

    subset(df1, ID %in% names(v1)[v1 == max(v1)])
    

    2) Using tapply

    lst <- with(df1, tapply(Month, ID, FUN = unique))
    names(which(lengths(lst) == length(unique(df1$Month))))
    #[1] "4" "6"
    

    Or using dplyr

    library(dplyr)
    df1 %>%
         group_by(ID) %>%
         filter(n_distinct(Month)== n_distinct(df1$Month)) %>%
         .$ID %>%
         unique
    #[1] 4 6
    

    or if we need to get the rows

    df1 %>%
         group_by(ID) %>%
         filter(n_distinct(Month)== n_distinct(df1$Month))
    # A tibble: 13 x 2
    # Groups:   ID [2]
    #      ID Month
    #   <int> <chr>
    # 1     4   Jan
    # 2     6   Jan
    # 3     6   Jan
    # 4     4   Feb
    # 5     6   Feb
    # 6     4   Mar
    # 7     6   Mar
    # 8     4   Apr
    # 9     6   Apr
    #10     4   May
    #11     6   May
    #12     4   Jun
    #13     6   Jun
    
    0 讨论(0)
提交回复
热议问题