Getting a list of indices where pandas boolean series is True

前端未结

关注

 2  396

I have a pandas series with boolean entries. I would like to get a list of indices where the values are True.

For example the input pd.Series([Tr


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  刺人心        
                
              
                            
                2020-12-05 18:05
              
            
            
                                                                       
Using Boolean Indexing
>>> s = pd.Series([True, False, True, True, False, False, False, True])
>>> s[s].index
Int64Index([0, 2, 3, 7], dtype='int64')

If need a np.array object, get the .values
>>> s[s].index.values
array([0, 2, 3, 7])


Using np.nonzero
>>> np.nonzero(s)
(array([0, 2, 3, 7]),)


Using np.flatnonzero
>>> np.flatnonzero(s)
array([0, 2, 3, 7])


Using np.where
>>> np.where(s)[0]
array([0, 2, 3, 7])


Using np.argwhere
>>> np.argwhere(s).ravel()
array([0, 2, 3, 7])


Using pd.Series.index
>>> s.index[s]
array([0, 2, 3, 7])


Using python's built-in filter
>>> [*filter(s.get, s.index)]
[0, 2, 3, 7]


Using list comprehension
>>> [i for i in s.index if s[i]]
[0, 2, 3, 7]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  盖世英雄少女心        
                
              
                            
                2020-12-05 18:09
              
            
            
                                                                       
As an addition to rafaelc's answer, here are the according times (from quickest to slowest) for the following setup
import numpy as np
import pandas as pd
s = pd.Series([x > 0.5 for x in np.random.random(size=1000)])

Using np.where
>>> timeit np.where(s)[0]
12.7 µs ± 77.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Using np.flatnonzero
>>> timeit np.flatnonzero(s)
18 µs ± 508 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Using pd.Series.index
The time difference to boolean indexing was really surprising to me, since the boolean indexing is usually more used.
>>> timeit s.index[s]
82.2 µs ± 38.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


Using Boolean Indexing
>>> timeit s[s].index
1.75 ms ± 2.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

If you need a np.array object, get the .values
>>> timeit s[s].index.values
1.76 ms ± 3.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

If you need a slightly easier to read version <-- not in original answer
>>> timeit s[s==True].index
1.89 ms ± 3.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


Using pd.Series.where <-- not in original answer
>>> timeit s.where(s).dropna().index
2.22 ms ± 3.32 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> timeit s.where(s == True).dropna().index
2.37 ms ± 2.19 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Using pd.Series.mask <-- not in original answer
>>> timeit s.mask(s).dropna().index
2.29 ms ± 1.43 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> timeit s.mask(s == True).dropna().index
2.44 ms ± 5.82 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Using list comprehension
>>> timeit [i for i in s.index if s[i]]
13.7 ms ± 40.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Using python's built-in filter
>>> timeit [*filter(s.get, s.index)]
14.2 ms ± 28.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)



Using np.nonzero <-- did not work out of the box for me
>>> timeit np.nonzero(s)
ValueError: Length of passed values is 1, index implies 1000.


Using np.argwhere <-- did not work out of the box for me
>>> timeit np.argwhere(s).ravel()
ValueError: Length of passed values is 1, index implies 1000.


                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复

Getting a list of indices where pandas boolean series is True

Using Boolean Indexing

Using np.nonzero

Using np.flatnonzero

Using np.where

Using np.argwhere

Using pd.Series.index

Using python's built-in filter

Using `list comprehension`

Using np.where

Using np.flatnonzero

Using pd.Series.index

Using Boolean Indexing

Using pd.Series.where <-- not in original answer

Using pd.Series.mask <-- not in original answer

Using `list comprehension`

Using python's built-in filter

Using np.nonzero <-- did not work out of the box for me

Using np.argwhere <-- did not work out of the box for me