Numpy Array: Efficiently find matching indices

前端未结

关注

 3  741

I have two lists, one of which is massive (millions of elements), the other several thousand. I want to do the following

bigArray=[0,1,0,2,3,2,,.....]

small


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  隐瞒了意图╮        
                
              
                            
                2021-01-12 18:49
              
            
            
                                                                       
So far I don't see any need for numpy; you can make use of defaultdict, provided that you memory is sufficient, it should be if number of observation is not too many millions.

big_list = [0,1,0,2,3,2,5,6,7,5,6,4,5,3,4,3,5,6,5]
small_list = [0,1,2,3,4]

from collections import defaultdict

dicto = defaultdict(list) #dictionary stores all the relevant coordinates
                          #so you don't have to search for them later

for ind, ele in enumerate(big_list):
    dicto[ele].append(ind)


Result:

>>> for ele in small_list:
...     print dicto[ele]
... 
[0, 2]
[1]
[3, 5]
[4, 13, 15]
[11, 14]


This should give you some speed.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  旧时难觅i        
                
              
                            
                2021-01-12 18:59
              
            
            
                                                                       
Numpy provides the function numpy.searchsorted: http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.searchsorted.html

Example:

>>> import numpy as np
>>> sorted = np.argsort(big_list)
>>> r = np.searchsorted(big_list, small_list, side='right',sorter=sorted)
>>> l  = np.searchsorted(big_list, small_list, side='left',sorter=sorted)
>>> for b, e in zip(l, r):
...     inds = sorted[b:e]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  清歌不尽        
                
              
                            
                2021-01-12 19:08
              
            
            
                                                                       
In your case you may benefit from presorting your big array. Here is the example demonstrating how you can reduce the time from ~ 45 seconds to 2 seconds (on my laptop)(for one particular set of lengths of the arrays 5e6 vs 1e3). Obviously the solution won't be optimal if the array sizes will be wastly different. E.g. with the default solution the complexity is O(bigN*smallN), but for my suggested solution it is O((bigN+smallN)*log(bigN))

import numpy as np, numpy.random as nprand, time, bisect

bigN = 5e6
smallN = 1000
maxn = 1e7
nprand.seed(1)  
bigArr = nprand.randint(0, maxn, size=bigN)
smallArr = nprand.randint(0, maxn, size=smallN)

# brute force 
t1 = time.time()
for i in range(len(smallArr)):
    inds = np.where(bigArr == smallArr[i])[0]
t2 = time.time()
print "Brute", t2-t1

# not brute force (like nested loop with index scan)
t1 = time.time()
sortedind = np.argsort(bigArr)
sortedbigArr = bigArr[sortedind]
for i in range(len(smallArr)):
    i1 = bisect.bisect_left(sortedbigArr, smallArr[i])
    i2 = bisect.bisect_right(sortedbigArr, smallArr[i])
    inds = sortedind[i1:i2]
t2=time.time()
print "Non-brute", t2-t1


Output:

Brute 42.5278530121

Non-brute 1.57193303108
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复