Mapping methods across multiple columns in a Pandas DataFrame

前端未结

关注

 3  1852

I have a Pandas dataframe where the values are lists:

import pandas as pd

DF = pd.DataFrame({\'X\':[[1, 5], [1, 2]], \'Y\':[[1, 2, 5], [1, 3, 5]]})
DF


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  傲寒        
                
              
                            
                2021-01-23 21:38
              
            
            
                                                                       
Option 1

set conversion and difference using np.where

df_temp = DF.applymap(set)
DF['x_sub_y'] = np.where(df_temp.X - df_temp.Y, False, True)
DF
        X          Y  x_sub_y
0  [1, 5]  [1, 2, 5]     True
1  [1, 2]  [1, 3, 5]    False




Option 2

Faster, astype conversion

DF['x_sub_y'] = ~(DF.X.apply(set) - DF.Y.apply(set)).astype(bool)
DF 
        X          Y  x_sub_y
0  [1, 5]  [1, 2, 5]     True
1  [1, 2]  [1, 3, 5]    False




Option 3

Fun with np.vectorize

def foo(x):
     return not x

v = np.vectorize(foo)    
DF['x_sub_y'] = v(DF.X.apply(set) - DF.Y.apply(set)) 
DF
        X          Y  x_sub_y
0  [1, 5]  [1, 2, 5]     True
1  [1, 2]  [1, 3, 5]    False


Extending Scott Boston's answer for speed using the same approach:

def foo(x, y):
    return set(x).issubset(y)

v = np.vectorize(foo)

DF['x_sub_y'] = v(DF.X, DF.Y)
DF
        X          Y  x_sub_y
0  [1, 5]  [1, 2, 5]     True
1  [1, 2]  [1, 3, 5]    False


Small

1000 loops, best of 3: 460 µs per loop           # Before       
10000 loops, best of 3: 103 µs per loop          # After


Large (df * 10000)

1 loop, best of 3: 1.26 s per loop               # Before   
100 loops, best of 3: 13.3 ms per loop           # After

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  执笔经年        
                
              
                            
                2021-01-23 21:39
              
            
            
                                                                       
Or you can try set

DF['x_sub_y']=DF.X+DF.Y
DF['x_sub_y']=DF['x_sub_y'].apply(lambda x : list(set(x)))==DF.Y
DF
Out[691]: 
        X          Y  x_sub_y
0  [1, 5]  [1, 2, 5]     True
1  [1, 2]  [1, 3, 5]    False

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  长情又很酷        
                
              
                            
                2021-01-23 21:54
              
            
            
                                                                       
Use set and issubset:

DF.assign(x_sub_y = DF.apply(lambda x: set(x.X).issubset(set(x.Y)), axis=1))


Output:

        X          Y  x_sub_y
0  [1, 5]  [1, 2, 5]     True
1  [1, 2]  [1, 3, 5]    False

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复