Is there a performance difference in using a GROUP BY with MAX() as the aggregate vs ROW_NUMBER over partition by?

前端未结

关注

 3  1474

Is there a performance difference between the following 2 queries, and if so, then which one is better?:

    select 
    q.id, 
    q.name 
    from(


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  灰色年华        
                
              
                            
                2021-01-18 00:29
              
            
            
                                                                       
I had a table of about 4.5M rows, and I wrote both a MAX with GROUP BY as well as a ROW_NUMBER solution and tested them both.  The MAX requires two clustered scans of the table, one to aggregate, and a second to join to the rest of the columns whereas ROW_NUMBER only needed one.  (Obviously one or both of these could be indexed to minimize IO, but the point is that GROUP BY requires two index scans.)

According to the optimizer, in my case the ROW_NUMBER is about 60% more efficient according to the subtree cost.  And according to statistics IO, about 20% less CPU time.  However, in real elapsed time, the ROW_NUMBER solution takes about 80% more real time.  So the GROUP BY wins in my case.

This seems to match the other answers here.  
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  失恋的感觉        
                
              
                            
                2021-01-18 00:37
              
            
            
                                                                       
I'd use the group by name.

Not much in it when the index is name, id DESC (Plan 1)

but if the index is declared as name, id ASC (Plan 2) then in 2008 I see the ROW_NUMBER version is unable to use this index and gets a sort operation whereas the GROUP BY is able to use a backwards index scan to avoid this.

You'd need to check the plans on your version of SQL Server and with your data and indexes to be sure.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  旧时难觅i        
                
              
                            
                2021-01-18 00:47
              
            
            
                                                                       
The group by should be faster.  The row number has to assign a row to all rows in the table.  It does this before filtering out the ones it doesn't want.

The second query is, by far, the better construct.  In the first, you have to be sure that the columns in the partition clause match the columns that you want.  More importantly, "group by" is a well-understood construct in SQL.  I would also speculate that the group by might make better use of indexes, but that is speculation.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复