Subsetting a data frame to the rows not appearing in another data frame

后端未结

关注

 5  902

I have a data frame A with observations

    Var1   Var2  Var3
     1       3    4
     2       5    6
     4       5    7
     4       5    8
     6       7    9


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  心在旅途        
                
              
                            
                2021-01-23 14:22
              
            
            
                                                                       
Using sqldf is an option.

require(sqldf)

C <- sqldf('SELECT * FROM A EXCEPT SELECT * FROM B')

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  没有蜡笔的小新        
                
              
                            
                2021-01-23 14:22
              
            
            
                                                                       
One approach could be to paste all the columns of A and B together, limiting to the rows in A whose pasted representation doesn't appear in the pasted representation of B:

A[!(do.call(paste, A) %in% do.call(paste, B)),]
#   Var1 Var2 Var3
# 3    4    5    7
# 4    4    5    8
# 5    6    7    9


One obvious downside of this approach is that it assumes two rows with the same pasted representation are in fact identical. Here is a slightly more clunky approach that doesn't have this limitation:

combined <- rbind(B, A)
combined[!duplicated(combined) & seq_len(nrow(combined)) > length(B),]
#   Var1 Var2 Var3
# 5    4    5    7
# 6    4    5    8
# 7    6    7    9


Basically I used rbind to append A below B and then limited to rows that are both non-duplicated and that are not originally from B.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  佛祖请我去吃肉        
                
              
                            
                2021-01-23 14:26
              
            
            
                                                                       
dplyr has a nice anti_join function that does exactly that:

> library(dplyr)
> anti_join(A, B)
Joining by: c("Var1", "Var2", "Var3")
  Var1 Var2 Var3
1    6    7    9
2    4    5    8
3    4    5    7

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  迷失自我        
                
              
                            
                2021-01-23 14:39
              
            
            
                                                                       
Using data.table you could do an anti-join as follows:

library(data.table)
setDT(df1)[!df2, on = names(df1)]


which gives the desired result:

   Var1 Var2 Var3
1:    4    5    7
2:    4    5    8
3:    6    7    9

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  灰色年华        
                
              
                            
                2021-01-23 14:47
              
            
            
                                                                       
Another option:

C <- rbind(A, B)
C[!(duplicated(C) | duplicated(C, fromLast = TRUE)), ]


Output:

  Var1 Var2 Var3
3    4    5    7
4    4    5    8
5    6    7    9

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复