Subsetting a data frame to the rows not appearing in another data frame

后端 未结 5 895
温柔的废话
温柔的废话 2021-01-23 13:53

I have a data frame A with observations

    Var1   Var2  Var3
     1       3    4
     2       5    6
     4       5    7
     4       5    8
     6       7    9         


        
相关标签:
5条回答
  • 2021-01-23 14:22

    Using sqldf is an option.

    require(sqldf)
    
    C <- sqldf('SELECT * FROM A EXCEPT SELECT * FROM B')
    
    0 讨论(0)
  • 2021-01-23 14:22

    One approach could be to paste all the columns of A and B together, limiting to the rows in A whose pasted representation doesn't appear in the pasted representation of B:

    A[!(do.call(paste, A) %in% do.call(paste, B)),]
    #   Var1 Var2 Var3
    # 3    4    5    7
    # 4    4    5    8
    # 5    6    7    9
    

    One obvious downside of this approach is that it assumes two rows with the same pasted representation are in fact identical. Here is a slightly more clunky approach that doesn't have this limitation:

    combined <- rbind(B, A)
    combined[!duplicated(combined) & seq_len(nrow(combined)) > length(B),]
    #   Var1 Var2 Var3
    # 5    4    5    7
    # 6    4    5    8
    # 7    6    7    9
    

    Basically I used rbind to append A below B and then limited to rows that are both non-duplicated and that are not originally from B.

    0 讨论(0)
  • 2021-01-23 14:26

    dplyr has a nice anti_join function that does exactly that:

    > library(dplyr)
    > anti_join(A, B)
    Joining by: c("Var1", "Var2", "Var3")
      Var1 Var2 Var3
    1    6    7    9
    2    4    5    8
    3    4    5    7
    
    0 讨论(0)
  • 2021-01-23 14:39

    Using data.table you could do an anti-join as follows:

    library(data.table)
    setDT(df1)[!df2, on = names(df1)]
    

    which gives the desired result:

       Var1 Var2 Var3
    1:    4    5    7
    2:    4    5    8
    3:    6    7    9
    
    0 讨论(0)
  • 2021-01-23 14:47

    Another option:

    C <- rbind(A, B)
    C[!(duplicated(C) | duplicated(C, fromLast = TRUE)), ]
    

    Output:

      Var1 Var2 Var3
    3    4    5    7
    4    4    5    8
    5    6    7    9
    
    0 讨论(0)
提交回复
热议问题