how does except method work in linq

后端未结

关注

 5  2096

I have the classes:

class SomeClass
{
   public string Name{get;set;}
   public int SomeInt{get;set;}
}


class SomeComparison: IEqualityComparer


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  独厮守ぢ        
                
              
                            
                2020-12-05 20:52
              
            
            
                                                                       
Your guess was close - the Linq to Objects  Except extension method uses a HashSet<T> internally for the second sequence passed in - that allows it to look up elements in O(1) while iterating over the first sequence to filter out elements that are contained in the second sequence, hence the overall effort is O(n+m) where n and m are the length of the input sequences - this is the best you can hope to do since you have to look at each element at least once.

For a review of how this might be implemented I recommend Jon Skeet's EduLinq series, here part of it's implementation of Except and the link to the full chapter:

private static IEnumerable<TSource> ExceptImpl<TSource>(
    IEnumerable<TSource> first,
    IEnumerable<TSource> second,
    IEqualityComparer<TSource> comparer)
{
    HashSet<TSource> bannedElements = new HashSet<TSource>(second, comparer);
    foreach (TSource item in first)
    {
        if (bannedElements.Add(item))
        {
            yield return item;
        }
    }
}


Your first implementation on the other hand will compare each element in the first list to each element in the second list - it is performing a cross product. This will require nm operations so it will run in O(nm) - when n and m become large this becomes prohibitively slow very fast. (Also this solution is wrong as is since it will create duplicate elements).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  你的背包        
                
              
                            
                2020-12-05 20:52
              
            
            
                                                                       
from a in list1 from b in list2 creates list1.Count * list2.Count elements and is not the same as list1.Except(list2)!

If list1 has the elements { a, b, c, d } and list2 the elements { a, b, c }, then your first query will yield the following pairs:  

(a,a), (a,b), (a,c),  
(b,a), (b,b), (b,c),  
(c,a), (c,b), (c,c),  
(d,a), (d,b), (d,c)


because you exclude equal items the result will be

(a,a), (a,b), (a,c),  
(b,a), (b,b), (b,c),  
(c,a), (c,b), (c,c),  
(d,a), (d,b), (d,c)


And because you select only the first element of the pairs, you will get

{ a, a, b, b, c, c, d, d, d }



The second query will yield { a, b, c, d } minus { a, b, c }, i.e { d }.



If no hash table is was used in Exclude then a nested loop performing with O(m*n) would result. With a hash table the query approximately performs with O(n) (neglecting the cost for filling the hash table).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  清歌不尽        
                
              
                            
                2020-12-05 20:53
              
            
            
                                                                       
It seems to me this would be more efficient  

private static IEnumerable<TSource> ExceptImpl<TSource>(
    IEnumerable<TSource> first,
    IEnumerable<TSource> second,
    IEqualityComparer<TSource> comparer)
{
    HashSet<TSource> bannedElements = new HashSet<TSource>(second, comparer);
    foreach (TSource item in first)
    {
        if (!bannedElements.Contains(item))
        {
            yield return item;
        }
    }
}


Contains is O(1)  

Add is if Count is less than the capacity of the internal array, this method is an O(1) operation. If the HashSet object must be resized, this method becomes an O(n) operation, where n is Count.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  清歌不尽        
                
              
                            
                2020-12-05 20:55
              
            
            
                                                                       
This is the way i think about it.

IEnumerable<T> Except<T>(IEnumerable<T> a,IEnumerable<T> b)
{
    return a.Where(x => !b.Contains(x)).Distinct();
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  长发绾君心        
                
              
                            
                2020-12-05 21:11
              
            
            
                                                                       
The two code examples do not produce the same results.

Your old code creates the Cartesian Product of the two lists.

That means it will return each element a in list1 multiple times - once for each element b in list2 that is not equal to a.

With "large" lists, this will take a long time.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复