I have a table which I want to get the latest entry for each group. Here\'s the table:
DocumentStatusLogs
Table
|ID| DocumentID | Status
I believe this can be done just like this. This might need some tweaking but you can just select the max from the group.
These answers are overkill..
SELECT
d.DocumentID,
MAX(d.Status),
MAX(d1.DateCreated)
FROM DocumentStatusLogs d, DocumentStatusLogs d1
USING(DocumentID)
GROUP BY d.DocumentID
ORDER BY DateCreated DESC
Here are 3 separate approaches to the problem in hand along with the best choices of indexing for each of those queries (please try out the indexes yourselves and see the logical read, elapsed time, execution plan. I have provided the suggestions from my experience on such queries without executing for this specific problem).
Approach 1: Using ROW_NUMBER(). If rowstore index is not being able to enhance the performance, you can try out nonclustered/clustered columnstore index as for queries with aggregation and grouping and for tables which are ordered by in different columns all the times, columnstore index usually is the best choice.
;WITH CTE AS
(
SELECT *,
RN = ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC)
FROM DocumentStatusLogs
)
SELECT ID
,DocumentID
,Status
,DateCreated
FROM CTE
WHERE RN = 1;
Approach 2: Using FIRST_VALUE. If rowstore index is not being able to enhance the performance, you can try out nonclustered/clustered columnstore index as for queries with aggregation and grouping and for tables which are ordered by in different columns all the times, columnstore index usually is the best choice.
SELECT DISTINCT
ID = FIRST_VALUE(ID) OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC)
,DocumentID
,Status = FIRST_VALUE(Status) OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC)
,DateCreated = FIRST_VALUE(DateCreated) OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC)
FROM DocumentStatusLogs;
Approach 3: Using CROSS APPLY. Creating rowstore index on DocumentStatusLogs table covering the columns used in the query should be enough to cover the query without need of a columnstore index.
SELECT DISTINCT
ID = CA.ID
,DocumentID = D.DocumentID
,Status = CA.Status
,DateCreated = CA.DateCreated
FROM DocumentStatusLogs D
CROSS APPLY (
SELECT TOP 1 I.*
FROM DocumentStatusLogs I
WHERE I.DocumentID = D.DocumentID
ORDER BY I.DateCreated DESC
) CA;
SELECT * FROM
DocumentStatusLogs JOIN (
SELECT DocumentID, MAX(DateCreated) DateCreated
FROM DocumentStatusLogs
GROUP BY DocumentID
) max_date USING (DocumentID, DateCreated)
What database server? This code doesn't work on all of them.
Regarding the second half of your question, it seems reasonable to me to include the status as a column. You can leave DocumentStatusLogs
as a log, but still store the latest info in the main table.
BTW, if you already have the DateCreated
column in the Documents table you can just join DocumentStatusLogs
using that (as long as DateCreated
is unique in DocumentStatusLogs
).
Edit: MsSQL does not support USING, so change it to:
ON DocumentStatusLogs.DocumentID = max_date.DocumentID AND DocumentStatusLogs.DateCreated = max_date.DateCreated
CROSS APPLY
was the method I used for my solution, as it worked for me, and for my clients needs. And from what I've read, should provide the best overall performance should their database grow substantially.
This is the most vanilla TSQL I can come up with
SELECT * FROM DocumentStatusLogs D1 JOIN
(
SELECT
DocumentID,MAX(DateCreated) AS MaxDate
FROM
DocumentStatusLogs
GROUP BY
DocumentID
) D2
ON
D2.DocumentID=D1.DocumentID
AND
D2.MaxDate=D1.DateCreated
My code to select top 1 from each group
select a.* from #DocumentStatusLogs a where datecreated in( select top 1 datecreated from #DocumentStatusLogs b where a.documentid = b.documentid order by datecreated desc )