Remove one dataframe from another with Pandas

前端未结

关注

 5  906

I have two dataframes of different size (df1 nad df2). I would like to remove from df1 all the rows which are stored within df2<


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  抹茶落季        
                
              
                            
                2021-01-20 00:33
              
            
            
                                                                       
I think the cleanest way can be:
We have base dataframe D and want to remove a subset D1. Let the output be D2
D2 = pd.DataFrame(D, index = set(D.index).difference(set(D1.index))).reset_index()

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轻奢々        
                
              
                            
                2021-01-20 00:43
              
            
            
                                                                       
The cleanest way I found was to use drop from pandas using the index of the dataframe you want to drop:

df1.drop(df2.index, axis=0,inplace=True)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  忘掉有多难        
                
              
                            
                2021-01-20 00:50
              
            
            
                                                                       
Use merge with outer join with filter by query, last remove helper column by drop:

df = pd.merge(df1, df2, on=['A','B'], how='outer', indicator=True)
       .query("_merge != 'both'")
       .drop('_merge', axis=1)
       .reset_index(drop=True)
print (df)
     A  B  C
0  qwe  5  a
1  rty  9  f
2  iop  1  k

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  误落风尘        
                
              
                            
                2021-01-20 00:50
              
            
            
                                                                       
pandas has a method called isin, however this relies on unique indices. We can define a lambda function to create columns we can use in this from the existing 'A' and 'B' of df1 and df2. We then negate this (as we want the values not in df2) and reset the index:

import pandas as pd

df1 = pd.DataFrame({'A' : ['qwe', 'wer', 'wer', 'rty', 'tyu', 'tyu', 'tyu', 'iop'],
                    'B' : [    5,     6,     6,     9,     7,     7,     7,     1],
                    'C' : ['a'  ,   's',   'd',   'f',   'g',   'h',   'j',   'k']})

df2 = pd.DataFrame({'A' : ['wer', 'tyu'],
                    'B' : [    6,     7]})

unique_ind = lambda df: df['A'].astype(str) + '_' + df['B'].astype(str)
print df1[~unique_ind(df1).isin(unique_ind(df2))].reset_index(drop=True)


printing:

     A  B  C
0  qwe  5  a
1  rty  9  f
2  iop  1  k

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情深已故        
                
              
                            
                2021-01-20 00:53
              
            
            
                                                                       
You can use np.in1d to check if any row in df1 exists in df2. And then use it as a reversed mask to select rows from df1.

df1[~df1[['A','B']].apply(lambda x: np.in1d(x,df2).all(),axis=1)]\
                   .reset_index(drop=True)
Out[115]: 
     A  B  C
0  qwe  5  a
1  rty  9  f
2  iop  1  k

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复