Get top 1 row of each group

后端未结

关注

 20  2996

I have a table which I want to get the latest entry for each group. Here\'s the table:

DocumentStatusLogs Table

|ID| DocumentID | Status


                      
              相关标签:


      
      
        
          20条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  离开以前        
                
              
                            
                2020-11-21 05:15
              
            
            
                                                                       
It is checked in SQLite that you can use the following simple query with GROUP BY

SELECT MAX(DateCreated), *
FROM DocumentStatusLogs
GROUP BY DocumentID


Here MAX help to get the maximum DateCreated FROM each group.

But it seems that MYSQL doesn't associate *-columns with the value of max DateCreated :(
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  走了就别回头了        
                
              
                            
                2020-11-21 05:16
              
            
            
                                                                       
I've done some timings over the various recommendations here, and the results really depend on the size of the table involved, but the most consistent solution is using the CROSS APPLY  These tests were run against SQL Server 2008-R2, using a table with 6,500 records, and another (identical schema) with 137 million records.  The columns being queried are part of the primary key on the table, and the table width is very small (about 30 bytes).  The times are reported by SQL Server from the actual execution plan.

Query                                  Time for 6500 (ms)    Time for 137M(ms)

CROSS APPLY                                    17.9                17.9
SELECT WHERE col = (SELECT MAX(COL)…)           6.6               854.4
DENSE_RANK() OVER PARTITION                     6.6               907.1


I think the really amazing thing was how consistent the time was for the CROSS APPLY regardless of the number of rows involved.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  盖世英雄少女心        
                
              
                            
                2020-11-21 05:16
              
            
            
                                                                       
In scenarios where you want to avoid using row_count(), you can also use a left join:

select ds.DocumentID, ds.Status, ds.DateCreated 
from DocumentStatusLogs ds
left join DocumentStatusLogs filter 
    ON ds.DocumentID = filter.DocumentID
    -- Match any row that has another row that was created after it.
    AND ds.DateCreated < filter.DateCreated
-- then filter out any rows that matched 
where filter.DocumentID is null 


For the example schema, you could also use a "not in subquery", which generally compiles to the same output as the left join: 

select ds.DocumentID, ds.Status, ds.DateCreated 
from DocumentStatusLogs ds
WHERE ds.ID NOT IN (
    SELECT filter.ID 
    FROM DocumentStatusLogs filter
    WHERE ds.DocumentID = filter.DocumentID
        AND ds.DateCreated < filter.DateCreated)


Note, the subquery pattern wouldn't work if the table didn't have at least one single-column unique key/constraint/index, in this case the primary key "Id".

Both of these queries tend to be more "expensive" than the row_count() query (as measured by Query Analyzer).  However, you might encounter scenarios where they return results faster or enable other optimizations.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  南笙        
                
              
                            
                2020-11-21 05:17
              
            
            
                                                                       
;WITH cte AS
(
   SELECT *,
         ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
   FROM DocumentStatusLogs
)
SELECT *
FROM cte
WHERE rn = 1


If you expect 2 entries per day, then this will arbitrarily pick one. To get both entries for a day, use DENSE_RANK instead

As for normalised or not, it depends if you want to:


maintain status in 2 places
preserve status history
...


As it stands, you preserve status history. If you want latest status in the parent table too (which is denormalisation) you'd need a trigger to maintain "status" in the parent. or drop this status history table.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  名媛妹妹        
                
              
                            
                2020-11-21 05:21
              
            
            
                                                                       
SELECT documentid, 
       status, 
       datecreated 
FROM   documentstatuslogs dlogs 
WHERE  status = (SELECT status 
                 FROM   documentstatuslogs 
                 WHERE  documentid = dlogs.documentid 
                 ORDER  BY datecreated DESC 
                 LIMIT  1) 

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  忘了有多久        
                
              
                            
                2020-11-21 05:22
              
            
            
                                                                       
I just learned how to use cross apply. Here's how to use it in this scenario:

 select d.DocumentID, ds.Status, ds.DateCreated 
 from Documents as d 
 cross apply 
     (select top 1 Status, DateCreated
      from DocumentStatusLogs 
      where DocumentID = d.DocumentId
      order by DateCreated desc) as ds

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     上一页
1
2
3
4
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复