I have a large data-set of emails sent and status-codes.
ID Recipient Date Status
1 someone@example.com 01/01/2010 1
2 someone@example.com
You cannot easily do this is a single query because count(*) is a group function whereas the latest status comes from a specific row. Here is the query to get the latest status for each user:
SELECT M.Recipient, M.Status FROM Messages M
WHERE M.Date = (SELECT MAX(SUB.Date) FROM MESSAGES SUB
WHERE SUB.Recipient = M.Recipient)
It's not very pretty, but I'd probably just use a couple of subselects:
SELECT Recipient,
COUNT(*) EmailCount,
(SELECT Status
FROM Messages M2
WHERE Recipient = M.Recipient
AND Date = (SELECT MAX(Date)
FROM Messages
WHERE Recipient = M2.Recipient))
FROM Messages M
GROUP BY Recipient
ORDER BY Recipient
You can use the ranking functions for this. Something like (not tested):
WITH MyResults AS
(
SELECT Recipient, Status, ROW_NUMBER() OVER( Recipient ORDER BY ( [date] DESC ) ) AS [row_number]
FROM Messages
)
SELECT MyResults.Recipient, MyCounts.EmailCount, MyResults.Status
FROM (
SELECT Recipient, Count(*) EmailCount
FROM Messages
GROUP BY Recipient
) MyCounts
INNER JOIN MyResults
ON MyCounts.Recipient = MyResults.Recipient
WHERE MyResults.[row_number] = 1
SELECT
M.Recipient,
C.EmailCount,
M.Status
FROM
(
SELECT Recipient, Count(*) EmailCount
FROM Messages
GROUP BY Recipient
) C
JOIN
(
SELECT Recipient, MAX(Date) AS LastDate
FROM Messages
GROUP BY Recipient
) MD ON C.Recipient = MD.Recipient
JOIN
Messages M ON MD.Recipient = M.Recipient AND MD.LastDate = M.Date
ORDER BY
Recipient
I've found aggregates mostly scale better then ranking functions
This is an example of a 'max per group' query. I think it is easiest to understand by splitting it up into two subqueries and then joining the results.
The first subquery is what you already have.
The second subquery uses the windowing function ROW_NUMBER to number the emails for each recipient starting with 1 for the most recent, then 2, 3, etc...
The results from the first query are then joined with the result from the second query that has row number 1, i.e. the most recent. Doing it this way guarantees that you will only get one row for each recipient in the case that there are ties.
Here is the query:
SELECT T1.Recipient, T1.EmailCount, T2.Status FROM
(
SELECT Recipient, COUNT(*) AS EmailCount
FROM Messages
GROUP BY Recipient
) T1
JOIN
(
SELECT
Recipient,
Status,
ROW_NUMBER() OVER (PARTITION BY Recipient ORDER BY Date Desc) AS rn
FROM Messages
) T2
ON T1.Recipient = T2.Recipient AND T2.rn = 1
This gives the following results:
Recipient EmailCount Status
others@example.com 2 2
someone@example.com 2 1
them@example.com 3 1