Get only latest row, grouped by a column

后端 未结 5 1461
醉酒成梦
醉酒成梦 2021-01-23 03:27

I have a large data-set of emails sent and status-codes.

ID Recipient           Date       Status
 1 someone@example.com 01/01/2010      1
 2 someone@example.com         


        
相关标签:
5条回答
  • 2021-01-23 03:53

    You cannot easily do this is a single query because count(*) is a group function whereas the latest status comes from a specific row. Here is the query to get the latest status for each user:

    SELECT M.Recipient, M.Status FROM Messages M
    WHERE M.Date = (SELECT MAX(SUB.Date) FROM MESSAGES SUB
        WHERE SUB.Recipient = M.Recipient)
    
    0 讨论(0)
  • 2021-01-23 03:58

    It's not very pretty, but I'd probably just use a couple of subselects:

    SELECT Recipient,
        COUNT(*) EmailCount,
        (SELECT Status
         FROM Messages M2
         WHERE Recipient = M.Recipient
             AND Date = (SELECT MAX(Date)
                         FROM Messages
                         WHERE Recipient = M2.Recipient))
    FROM Messages M
    GROUP BY Recipient
    ORDER BY Recipient
    
    0 讨论(0)
  • 2021-01-23 03:58

    You can use the ranking functions for this. Something like (not tested):

    WITH MyResults AS
    (
       SELECT Recipient, Status, ROW_NUMBER() OVER( Recipient ORDER BY (  [date] DESC ) ) AS   [row_number]
       FROM Messages
    )
    SELECT MyResults.Recipient, MyCounts.EmailCount, MyResults.Status
    FROM (
        SELECT Recipient, Count(*) EmailCount
        FROM Messages
        GROUP BY Recipient
    ) MyCounts
    INNER JOIN MyResults
    ON MyCounts.Recipient = MyResults.Recipient
    WHERE MyResults.[row_number] = 1
    
    0 讨论(0)
  • 2021-01-23 04:08
    SELECT
        M.Recipient,
        C.EmailCount,
        M.Status
    FROM
        (
        SELECT Recipient, Count(*) EmailCount
        FROM Messages
        GROUP BY Recipient
        ) C
        JOIN
        (
        SELECT Recipient, MAX(Date) AS LastDate
        FROM Messages
        GROUP BY Recipient
        ) MD ON C.Recipient = MD.Recipient
        JOIN
        Messages M ON MD.Recipient = M.Recipient AND MD.LastDate = M.Date
    ORDER BY
        Recipient
    

    I've found aggregates mostly scale better then ranking functions

    0 讨论(0)
  • 2021-01-23 04:09

    This is an example of a 'max per group' query. I think it is easiest to understand by splitting it up into two subqueries and then joining the results.

    The first subquery is what you already have.

    The second subquery uses the windowing function ROW_NUMBER to number the emails for each recipient starting with 1 for the most recent, then 2, 3, etc...

    The results from the first query are then joined with the result from the second query that has row number 1, i.e. the most recent. Doing it this way guarantees that you will only get one row for each recipient in the case that there are ties.

    Here is the query:

    SELECT T1.Recipient, T1.EmailCount, T2.Status FROM
    (
        SELECT Recipient, COUNT(*) AS EmailCount
        FROM Messages
        GROUP BY Recipient
    ) T1
    JOIN
    (
        SELECT
            Recipient,
            Status,
            ROW_NUMBER() OVER (PARTITION BY Recipient ORDER BY Date Desc) AS rn
        FROM Messages
    ) T2
    ON T1.Recipient = T2.Recipient AND T2.rn = 1
    

    This gives the following results:

    Recipient            EmailCount  Status  
    others@example.com   2           2       
    someone@example.com  2           1       
    them@example.com     3           1       
    
    0 讨论(0)
提交回复
热议问题