Efficient pandas/numpy function for time since change

前端未结

关注

 2  1193

Given a Series , I would like to efficiently compute how many observations have passed since there was a change. Here is a simple example:

ser = pd.


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  故里飘歌        
                
              
                            
                2021-01-21 05:32
              
            
            
                                                                       
Here's one NumPy approach -

def array_cumcount(a):
    idx = np.flatnonzero(a[1:] != a[:-1])+1
    shift_arr = np.ones(a.size,dtype=int)
    shift_arr[0] = 0

    if len(idx)>=1:
        shift_arr[idx[0]] = -idx[0]+1
        shift_arr[idx[1:]] = -idx[1:] + idx[:-1] + 1
    return shift_arr.cumsum()


Sample run -

In [583]: ser = pd.Series([1.2,1.2,1.2,1.2,2,2,2,4,3,3,3,3])

In [584]: array_cumcount(ser.values)
Out[584]: array([0, 1, 2, 3, 0, 1, 2, 0, 0, 1, 2, 3])


Runtime test -

In [601]: ser = pd.Series(np.random.randint(0,3,(10000)))

# @Psidom's soln
In [602]: %timeit ser.groupby(ser).cumcount()
1000 loops, best of 3: 729 µs per loop

In [603]: %timeit array_cumcount(ser.values)
10000 loops, best of 3: 85.3 µs per loop

In [604]: ser = pd.Series(np.random.randint(0,3,(1000000)))

# @Psidom's soln
In [605]: %timeit ser.groupby(ser).cumcount()
10 loops, best of 3: 30.1 ms per loop

In [606]: %timeit array_cumcount(ser.values)
100 loops, best of 3: 11.7 ms per loop

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轮回少年        
                
              
                            
                2021-01-21 05:46
              
            
            
                                                                       
You can use groupby.cumcount:

ser.groupby(ser).cumcount()

#0    0
#1    1
#2    2
#3    3
#4    0
#5    1
#6    2
#7    0
#8    0
#dtype: int64

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复