I have a pivot query on a table with millions of rows. Running the query normally, it runs in 2 seconds and returns 2983 rows. If I add TOP 1000 to the query it takes 10 secon
After doing some googling about suggesting an execution plan, I found the solution.
SELECT TOP 1000 *
FROM (SELECT l.PatientID,
l.LabID,
l.Result
FROM dbo.Labs l
JOIN (SELECT MAX(LabDate) maxDate,
PatientID,
LabID
FROM dbo.Labs
GROUP BY PatientID, LabID) s ON l.PatientID = s.PatientID
AND l.LabID = s.LabID
AND l.LabDate = s.maxDate) A
PIVOT(MIN(A.Result) FOR A.LabID IN ([1],[2],[3],[4],[5],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17])) p
OPTION (HASH JOIN)
OPTION (HASH JOIN)
being the thing. The resulting execution plan for the version with TOP looks like the original non-top one, with a TOP tacked on at the end.
Since I was originally doing this in a view what I actually ended up doing was changing JOIN
to INNER HASH JOIN
SELECT TOP 1000
*
FROM (
SELECT patientId, labId, result,
DENSE_RANK() OVER (PARTITION BY patientId, labId ORDER BY labDate DESC) dr
FROM labs
) q
PIVOT (
MIN(result)
FOR
labId IN ([1],[2],[3],[4],[5],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17])
) p
WHERE dr = 1
ORDER BY
patientId
You may also try creating an indexed view like this:
CREATE VIEW
v_labs_patient_lab
WITH SCHEMABINDING
AS
SELECT patientId, labId, COUNT_BIG(*) AS cnt
FROM dbo.labs
GROUP BY
patientId, labId
CREATE UNIQUE CLUSTERED INDEX
ux_labs_patient_lab
ON v_labs_patient_lab (patientId, labId)
and use it in the query:
SELECT TOP 1000
*
FROM (
SELECT lr.patientId, lr.labId, lr.result
FROM v_labs_patient_lab vl
CROSS APPLY
(
SELECT TOP 1 WITH TIES
result
FROM labs l
WHERE l.patientId = vl.patientId
AND l.labId = vl.labId
ORDER BY
l.labDate DESC
) lr
) q
PIVOT (
MIN(result)
FOR
labId IN ([1],[2],[3],[4],[5],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17])
) p
ORDER BY
patientId
There is a specific order in which queries are processed.
A normal SQL query will be written as follows:
SELECT [...]
FROM [table1]
JOIN [table2]
ON [condition]
WHERE [...]
GROUP BY [...]
HAVING [...]
ORDER BY [...]
But the processing order is different:
FROM [table1]
ON [condition]
JOIN [table2]
WHERE [...]
GROUP BY [...]
HAVING [...]
SELECT [...]
ORDER BY [...]
When using SELECT DISTINCT [...]
or SELECT TOP [...]
the processing order will be as follows:
FROM [table1]
ON [condition]
JOIN [table2]
WHERE [...]
GROUP BY [...]
HAVING [...]
SELECT [...] DISTINCT[...]
ORDER BY [...]
TOP [....]
Hence it's taking longer as your SELECT TOP 1000
is processed last.
Take a look at this link for further details: http://blogs.msdn.com/b/sqlqueryprocessing/