Intersection complexity

前端未结

关注

 3  1226

In Python you can get the intersection of two sets doing:

>>> s1 = {1, 2, 3, 4, 5, 6, 7, 8, 9}
>>> s2 = {0, 3, 5, 6, 10}
>>> s1 &a


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  囚心锁ツ        
                
              
                            
                2020-12-09 10:52
              
            
            
                                                                       
Set intersection of two sets of sizes m,n can be achieved with O(max{m,n} * log(min{m,n})) in the following way:
Assume m << n

1. Represent the two sets as list/array(something sortable)
2. Sort the **smaller** list/array (cost: m*logm)
3. Do until all elements in the bigger list has been checked:
    3.1 Sort the next **m** items on the bigger list(cost: m*logm)
    3.2 With a single pass compare the smaller list and the m items you just sorted and take the ones that appear in both of them(cost: m)
4. Return the new set


The loop in step 3 will run for n/m iterations and each iteration will take O(m*logm), so you will have time complexity of O(nlogm) for m << n.

I think that's the best lower bound that exists
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  北恋        
                
              
                            
                2020-12-09 10:53
              
            
            
                                                                       
The intersection algorithm always runs at O(min(len(s1), len(s2))).

In pure Python, it looks like this:

    def intersection(self, other):
        if len(self) <= len(other):
            little, big = self, other
        else:
            little, big = other, self
        result = set()
        for elem in little:
            if elem in big:
                result.add(elem)
        return result


[Answer to the question in the additional edit] The data structure behind sets is a hash table.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  甜味超标        
                
              
                            
                2020-12-09 11:00
              
            
            
                                                                       
The answer appears to be a search engine query away. You can also use this direct link to the Time Complexity page at python.org. Quick summary:

Average:     O(min(len(s), len(t))
Worst case:  O(len(s) * len(t))


EDIT: As Raymond points out below, the "worst case" scenario isn't likely to occur. I included it originally to be thorough, and I'm leaving it to provide context for the discussion below, but I think Raymond's right.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复