Get top 1 row of each group

后端 未结 20 3038
余生分开走
余生分开走 2020-11-21 04:42

I have a table which I want to get the latest entry for each group. Here\'s the table:

DocumentStatusLogs Table

|ID| DocumentID | Status         


        
20条回答
  •  逝去的感伤
    2020-11-21 05:13

    This is quite an old thread, but I thought I'd throw my two cents in just the same as the accepted answer didn't work particularly well for me. I tried gbn's solution on a large dataset and found it to be terribly slow (>45 seconds on 5 million plus records in SQL Server 2012). Looking at the execution plan it's obvious that the issue is that it requires a SORT operation which slows things down significantly.

    Here's an alternative that I lifted from the entity framework that needs no SORT operation and does a NON-Clustered Index search. This reduces the execution time down to < 2 seconds on the aforementioned record set.

    SELECT 
    [Limit1].[DocumentID] AS [DocumentID], 
    [Limit1].[Status] AS [Status], 
    [Limit1].[DateCreated] AS [DateCreated]
    FROM   (SELECT DISTINCT [Extent1].[DocumentID] AS [DocumentID] FROM [dbo].[DocumentStatusLogs] AS [Extent1]) AS [Distinct1]
    OUTER APPLY  (SELECT TOP (1) [Project2].[ID] AS [ID], [Project2].[DocumentID] AS [DocumentID], [Project2].[Status] AS [Status], [Project2].[DateCreated] AS [DateCreated]
        FROM (SELECT 
            [Extent2].[ID] AS [ID], 
            [Extent2].[DocumentID] AS [DocumentID], 
            [Extent2].[Status] AS [Status], 
            [Extent2].[DateCreated] AS [DateCreated]
            FROM [dbo].[DocumentStatusLogs] AS [Extent2]
            WHERE ([Distinct1].[DocumentID] = [Extent2].[DocumentID])
        )  AS [Project2]
        ORDER BY [Project2].[ID] DESC) AS [Limit1]
    

    Now I'm assuming something that isn't entirely specified in the original question, but if your table design is such that your ID column is an auto-increment ID, and the DateCreated is set to the current date with each insert, then even without running with my query above you could actually get a sizable performance boost to gbn's solution (about half the execution time) just from ordering on ID instead of ordering on DateCreated as this will provide an identical sort order and it's a faster sort.

提交回复
热议问题