How do I identify sequences of values in a boolean array?

前端未结

关注

 4  677

I have a long boolean array:

bool_array = [ True, True, True, True, True, False, False, False, False, False, True, True, True, False, False, True, True, True


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  忘掉有多难        
                
              
                            
                2020-12-31 07:42
              
            
            
                                                                       
Using zip and enumerate you can do 

>>> [i for i,(m,n) in enumerate(zip(bool_array[:-1],bool_array[1:])) if m!=n]
[4, 9, 12, 14, 18]


Now that you have [4, 9, 12, 14, 18], you can 

>>> [0]+[i+1 for i in [4, 9, 12, 14, 18]]+[len(bool_array)]
[0, 5, 10, 13, 15, 19, 26]


To achieve your output. 



The logic behind the code: 


zip takes in two iterators and returns a sequence of two elements. We pass the same list for both iterators starting from the first element and one starting from the second. Hence we get a list of adjacent numbers 
enumerate gives you a sequence of indexes and the value of the iterator.
Now we wrap it in a list comprehension. If the zipped values are not the same, we return the index




Another  single step procedure is 

>>> [i for i,(m,n) in enumerate(zip([2]+bool_array,bool_array+[2])) if m!=n]
[0, 5, 10, 13, 15, 19, 26]


Here we are deliberately introducing [2] into the list, This is because the first and the last values will always be different (as [2] is never present in the list). Hence we will get those indexes directly. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  逝去的感伤        
                
              
                            
                2020-12-31 07:42
              
            
            
                                                                       
Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), we can use and increment a variable within a list comprehension. Coupled with groupby:

from itertools import groupby

# bool_array = [True, True, True, True, True, False, False, False, False, False, True, True, True, False, False, True, True, True, True, False, False, False, False, False, False, False]
total = 0
[total := total + len(list(gp)) for _, gp in groupby(bool_array)]
# [5, 10, 13, 15, 19, 26]


This:


Initializes a variable total to 0 which symbolizes the cumulative sum
Groups consecutive items with groupby (consecutive True will be grouped together, same goes for consecutive False)
For each series of grouped booleans, this both:


increments total with the current length of the series of booleans (total := total + len(list(gp))) via an assignment expression
and at the same time, maps the consecutive series to the new value of total



_{Of course to make this start with 0, you can always plug [0] to the front of the list.}
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  闹比i        
                
              
                            
                2020-12-31 07:50
              
            
            
                                                                       
As a more efficient approach for large datasets, in python 3.X you can use accumulate and  groupby function from itertools module.

>>> from itertools import accumulate, groupby
>>> [0] + list(accumulate(sum(1 for _ in g) for _,g in groupby(bool_array)))
[0, 5, 10, 13, 15, 19, 26]




The logic behind the code:

This code, categorizes the sequential duplicate items using groupby() function, then loops over the iterator returned by groupby() which is contains pairs of keys (that we escaped it using under line instead of a throw away variable) and these categorized iterators.

>>> [list(g) for _, g in groupby(bool_array)]
[[True, True, True, True, True], [False, False, False, False, False], [True, True, True], [False, False], [True, True, True, True], [False, False, False, False, False, False, False]]


So all we need is calculating the length of these iterators and sum each length with its previous length, in order to get the index of first item which is exactly where the item is changed, that is exactly what that accumulate() function is for.   

In Numpy you can use the following approach:

In [19]: np.where(arr[1:] - arr[:-1])[0] + 1
Out[19]: array([ 5, 10, 13, 15, 19])
# With leading and trailing indices
In [22]: np.concatenate(([0], np.where(arr[1:] - arr[:-1])[0] + 1, [arr.size]))
Out[22]: array([ 0,  5, 10, 13, 15, 19, 26])

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  有刺的猬        
                
              
                            
                2020-12-31 07:57
              
            
            
                                                                       
This will tell you where:

>>> import numpy as np
>>> np.argwhere(np.diff(bool_array)).squeeze()
array([ 4,  9, 12, 14, 18])




np.diff calculates the difference between each element and the next.  For booleans, it essentially interprets the values as integers (0: False, non-zero: True), so differences appear as +1 or -1 values, which then get mapped back to booleans (True when there is a change).

The np.argwhere function then tells you where the values are True --- which are now the changes.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复