Get top 1 row of each group

后端 未结 20 2997
余生分开走
余生分开走 2020-11-21 04:42

I have a table which I want to get the latest entry for each group. Here\'s the table:

DocumentStatusLogs Table

|ID| DocumentID | Status         


        
相关标签:
20条回答
  • 2020-11-21 05:22

    I believe this can be done just like this. This might need some tweaking but you can just select the max from the group.

    These answers are overkill..

    SELECT
      d.DocumentID,
      MAX(d.Status),
      MAX(d1.DateCreated)
    FROM DocumentStatusLogs d, DocumentStatusLogs d1
    USING(DocumentID)
    GROUP BY d.DocumentID
    ORDER BY DateCreated DESC
    
    0 讨论(0)
  • 2020-11-21 05:25

    Here are 3 separate approaches to the problem in hand along with the best choices of indexing for each of those queries (please try out the indexes yourselves and see the logical read, elapsed time, execution plan. I have provided the suggestions from my experience on such queries without executing for this specific problem).

    Approach 1: Using ROW_NUMBER(). If rowstore index is not being able to enhance the performance, you can try out nonclustered/clustered columnstore index as for queries with aggregation and grouping and for tables which are ordered by in different columns all the times, columnstore index usually is the best choice.

    ;WITH CTE AS
        (
           SELECT   *,
                    RN = ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC)
           FROM     DocumentStatusLogs
        )
        SELECT  ID      
            ,DocumentID 
            ,Status     
            ,DateCreated
        FROM    CTE
        WHERE   RN = 1;
    

    Approach 2: Using FIRST_VALUE. If rowstore index is not being able to enhance the performance, you can try out nonclustered/clustered columnstore index as for queries with aggregation and grouping and for tables which are ordered by in different columns all the times, columnstore index usually is the best choice.

    SELECT  DISTINCT
        ID      = FIRST_VALUE(ID) OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC)
        ,DocumentID
        ,Status     = FIRST_VALUE(Status) OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC)
        ,DateCreated    = FIRST_VALUE(DateCreated) OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC)
    FROM    DocumentStatusLogs;
    

    Approach 3: Using CROSS APPLY. Creating rowstore index on DocumentStatusLogs table covering the columns used in the query should be enough to cover the query without need of a columnstore index.

    SELECT  DISTINCT
        ID      = CA.ID
        ,DocumentID = D.DocumentID
        ,Status     = CA.Status 
        ,DateCreated    = CA.DateCreated
    FROM    DocumentStatusLogs D
        CROSS APPLY (
                SELECT  TOP 1 I.*
                FROM    DocumentStatusLogs I
                WHERE   I.DocumentID = D.DocumentID
                ORDER   BY I.DateCreated DESC
                ) CA;
    
    0 讨论(0)
  • 2020-11-21 05:26
    SELECT * FROM
    DocumentStatusLogs JOIN (
      SELECT DocumentID, MAX(DateCreated) DateCreated
      FROM DocumentStatusLogs
      GROUP BY DocumentID
      ) max_date USING (DocumentID, DateCreated)
    

    What database server? This code doesn't work on all of them.

    Regarding the second half of your question, it seems reasonable to me to include the status as a column. You can leave DocumentStatusLogs as a log, but still store the latest info in the main table.

    BTW, if you already have the DateCreated column in the Documents table you can just join DocumentStatusLogs using that (as long as DateCreated is unique in DocumentStatusLogs).

    Edit: MsSQL does not support USING, so change it to:

    ON DocumentStatusLogs.DocumentID = max_date.DocumentID AND DocumentStatusLogs.DateCreated = max_date.DateCreated
    
    0 讨论(0)
  • 2020-11-21 05:27

    CROSS APPLY was the method I used for my solution, as it worked for me, and for my clients needs. And from what I've read, should provide the best overall performance should their database grow substantially.

    0 讨论(0)
  • 2020-11-21 05:27

    This is the most vanilla TSQL I can come up with

        SELECT * FROM DocumentStatusLogs D1 JOIN
        (
          SELECT
            DocumentID,MAX(DateCreated) AS MaxDate
          FROM
            DocumentStatusLogs
          GROUP BY
            DocumentID
        ) D2
        ON
          D2.DocumentID=D1.DocumentID
        AND
          D2.MaxDate=D1.DateCreated
    
    0 讨论(0)
  • 2020-11-21 05:30

    My code to select top 1 from each group

    select a.* from #DocumentStatusLogs a where 
     datecreated in( select top 1 datecreated from #DocumentStatusLogs b
    where 
    a.documentid = b.documentid
    order by datecreated desc
    )
    
    0 讨论(0)
提交回复
热议问题