Replace a list of numbers with flat sub-ranges

后端未结

关注

 1  1055

Given a list of numbers, like this:

lst = [0, 10, 15, 17]

I\'d like a list that has elements from i -> i + 3 for all


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  故里飘歌        
                
              
                            
                2021-01-18 23:53
              
            
            
                                                                       
Approach #1 : One approach based on broadcasted summation and then using np.unique to get unique numbers -

np.unique(np.asarray(lst)[:,None] + np.arange(4))


Approach #2 : Another based on broadcasted summation and then masking -

def mask_app(lst, interval_len = 4):
    arr = np.array(lst)
    r = np.arange(interval_len)
    ranged_vals = arr[:,None] + r
    a_diff = arr[1:] - arr[:-1]
    valid_mask = np.vstack((a_diff[:,None] > r, np.ones(interval_len,dtype=bool)))
    return ranged_vals[valid_mask]


Runtime test

Original approach -

from collections import OrderedDict
def org_app(lst):
    list(OrderedDict.fromkeys([y for x in lst for y in range(x, x + 4)]).keys())


Timings -

In [409]: n = 10000

In [410]: lst = np.unique(np.random.randint(0,4*n,(n))).tolist()

In [411]: %timeit org_app(lst)
     ...: %timeit np.unique(np.asarray(lst)[:,None] + np.arange(4))
     ...: %timeit mask_app(lst, interval_len = 4)
     ...: 
10 loops, best of 3: 32.7 ms per loop
1000 loops, best of 3: 1.03 ms per loop
1000 loops, best of 3: 671 µs per loop

In [412]: n = 100000

In [413]: lst = np.unique(np.random.randint(0,4*n,(n))).tolist()

In [414]: %timeit org_app(lst)
     ...: %timeit np.unique(np.asarray(lst)[:,None] + np.arange(4))
     ...: %timeit mask_app(lst, interval_len = 4)
     ...: 
1 loop, best of 3: 350 ms per loop
100 loops, best of 3: 14.7 ms per loop
100 loops, best of 3: 9.73 ms per loop


The bottleneck with the two posted approaches seems like is with the conversion to array, though that seems to be paying off well afterwards. Just to give a sense of the time spent on the conversion for the last dataset -

In [415]: %timeit np.array(lst)
100 loops, best of 3: 5.6 ms per loop

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复