Remove rows in data.table according to another data.table

后端未结

关注

 2  1601

I have a data.table named dtA:

My actual dtA has 62871932 rows and 3 columns:

  date    company    value
19810


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  死守一世寂寞        
                
              
                            
                2021-01-06 05:22
              
            
            
                                                                       
I think I know how to solve this:

in dtB I add a pointer using data.table syntax:

dtB[, pointer := 1]


dtB will looks like this 

  date    company    value    pointer
198101          A        2          1
198102          B        5          1


Then I use LEFT OUTER JOIN method from here:
https://rstudio-pubs-static.s3.amazonaws.com/52230_5ae0d25125b544caab32f75f0360e775.html

setkey(dtA, date, company, value)
setkey(dtB, date, company, value)
dtA=merge(dtA, dtB, all.x)


This means on pointer column, if dtB's row exist in dtA, it will give 1. If dtB's row do not exist in dtA's, then it will be given NA

Result will be:

  date    company    value    pointer
198101          A        1         NA
198101          A        2          1
198101          B        5         NA
198102          A        2         NA
198102          B        5          1
198102          B        6         NA


I then select those rows with NA and remove pointer column:

dtA=dtA[!(pointer %in% "1")][,-c("pointer")]


I get my result:

  date    company    value
198101          A        1
198101          B        5
198102          A        2
198102          B        6

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  离开以前        
                
              
                            
                2021-01-06 05:35
              
            
            
                                                                       
Use an anti-join:

dtA[!dtB, on=.(date, company, value)]


This matches all records in dtA that are not found in dtB using the columns in on.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复