Finding an array elements location in a pandas frame column (a.k.a pd.series)

前端未结

关注

 5  1630

I have a pandas frame similar to this one:

import pandas as pd
import numpy as np

data = {\'Col1\' : [4,5,6,7], \'Col2\' : [10,20,30,40], \'Col3\' : [100,5


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  名媛妹妹        
                
              
                            
                2021-01-12 02:58
              
            
            
                                                                       
This should do it:

df.loc[df.Col4.isin(target_array)].index




EDIT:

I ran three options: from selected answers.  Mine, Bruce Pucci, and Divakar



Divakars was faster by a large amount.  I'd pick his.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦如初夏        
                
              
                            
                2021-01-12 03:09
              
            
            
                                                                       
For the sake of completeness I've added two (.query() variants) - my timings against 400K rows df:

In [63]: df.shape
Out[63]: (400000, 4)

In [64]:  %timeit df.index[np.in1d(df['Col4'],target_array)]
10 loops, best of 3: 35.1 ms per loop

In [65]: %timeit df.index[df.Col4.isin(target_array)]
10 loops, best of 3: 36.7 ms per loop

In [66]: %timeit df.loc[df.Col4.isin(target_array)].index
10 loops, best of 3: 47.8 ms per loop

In [67]: %timeit df.query('@target_array.tolist() == Col4')
10 loops, best of 3: 45.7 ms per loop

In [68]: %timeit df.query('@target_array in Col4')
10 loops, best of 3: 51.9 ms per loop


Here is a similar comparison for (not in ...) and for different dtypes
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  伪装坚强ぢ        
                
              
                            
                2021-01-12 03:09
              
            
            
                                                                       
df.index[df.Col4.isin(target_array)]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  攒了一身酷        
                
              
                            
                2021-01-12 03:12
              
            
            
                                                                       
import pandas as pd
import numpy as np

data = {'Col1' : [4,5,6,7], 'Col2' : [10,20,30,40], 'Col3' : [100,50,-30,-50], 'Col4' : ['AAA', 'BBB', 'AAA', 'CCC']}
target_array = np.array(['AAA', 'CCC', 'EEE'])

df = pd.DataFrame(data=data, index = ['R1','R2','R3','R4'])

df['in_col'] = df['Col4'].apply(lambda x: x in target_array)


Is this what you were looking for?  Then you can groupby the new column and query the True elements.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  小鲜肉        
                
              
                            
                2021-01-12 03:15
              
            
            
                                                                       
You can use NumPy's in1d -

df.index[np.in1d(df['Col4'],target_array)]


Explanation

1) Create a 1D mask corresponding to each row telling us whether there is a match between col4's element and any element in target_array :

mask = np.in1d(df['Col4'],target_array)


2) Use the mask to select valid indices from the dataframe as final output :

out = df.index[np.in1d(df['Col4'],target_array)]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复