SQL Efficiency: WHERE IN Subquery vs. JOIN then GROUP

后端未结

关注

 7  1654

As an example, I want to get the list of all items with certain tags applied to them. I could do either of the following:

SELECT Item.ID, Item.Name
FROM Item
WH


                      
              相关标签:


      
      
        
          7条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  忘掉有多难        
                
              
                            
                2021-02-07 09:47
              
            
            
                                                                       

SELECT Item.ID, Item.Name
FROM Item
WHERE Item.ID IN (
    SELECT ItemTag.ItemID
    FROM ItemTag
    WHERE ItemTag.TagID = 57 OR ItemTag.TagID = 55)

  
  or

SELECT Item.ID, Item.Name
FROM Item
LEFT JOIN ItemTag ON ItemTag.ItemID = Item.ID
WHERE ItemTag.TagID = 57 OR ItemTag.TagID = 55
GROUP BY Item.ID



Your second query won't compile, since it references Item.Name without either grouping or aggregating on it.

If we remove GROUP BY from the query:

SELECT  Item.ID, Item.Name
FROM    Item
JOIN    ItemTag
ON      ItemTag.ItemID = Item.ID
WHERE   ItemTag.TagID = 57 OR ItemTag.TagID = 55


these are still different queries, unless ItemTag.ItemId is a UNIQUE key and marked as such.

SQL Server is able to detect an IN condition on a UNIQUE column, and will just transform the IN condition into a JOIN.

If ItemTag.ItemID is not UNIQUE, the first query will use a kind of a SEMI JOIN algorithm, which are quite efficient in SQL Server.

You can trasform the second query into a JOIN:

SELECT  Item.ID, Item.Name
FROM    Item
JOIN    (
        SELECT DISTINCT ItemID
        FROMT  ItemTag
        WHERE  ItemTag.TagID = 57 OR ItemTag.TagID = 55
        ) tags
ON      tags.ItemID = Item.ID


but this one is a trifle less efficient than IN or EXISTS.

See this article in my blog for a more detailed performance comparison:


IN vs. JOIN vs. EXISTS

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  陌清茗        
                
              
                            
                2021-02-07 09:49
              
            
            
                                                                       
run this:

SET SHOWPLAN_ALL ON


then run each version of the query

you can see if they return the same plan, and if not look at the TotalSubtreeCost on the first row of each and see how different they are.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  遇见更好的自我        
                
              
                            
                2021-02-07 09:52
              
            
            
                                                                       
The second one is more efficient in MySQL. MySQL will re-execute the query within the IN statement for every WHERE condition test.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  予麋鹿        
                
              
                            
                2021-02-07 10:02
              
            
            
                                                                       
I think it would depend on how the optimizer handles them, it may even be the case that you end up with the same performance. Display execution plan is your friend here.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  无人及你        
                
              
                            
                2021-02-07 10:09
              
            
            
                                                                       
SELECT Item.ID, Item.Name
...
GROUP BY Item.ID


This is not valid T-SQL.  Item.Name must appear in the group by clause or within an aggregate function, such as SUM or MAX.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  天命终不由人        
                
              
                            
                2021-02-07 10:09
              
            
            
                                                                       
It's pretty much impossible (unless you're one of those crazy guru DBAs) to tell what will be fast and what won't without looking at the execution plan and/or running some stress tests.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复