TOP slows down query

前端 未结 3 1911
自闭症患者
自闭症患者 2021-01-21 18:28

I have a pivot query on a table with millions of rows. Running the query normally, it runs in 2 seconds and returns 2983 rows. If I add TOP 1000 to the query it takes 10 secon

相关标签:
3条回答
  • 2021-01-21 18:50

    After doing some googling about suggesting an execution plan, I found the solution.

    SELECT TOP 1000 * 
    FROM (SELECT l.PatientID,
                   l.LabID,
                   l.Result
              FROM dbo.Labs l
              JOIN (SELECT MAX(LabDate) maxDate, 
                           PatientID, 
                           LabID 
                      FROM dbo.Labs 
                  GROUP BY PatientID, LabID) s ON l.PatientID = s.PatientID
                                              AND l.LabID = s.LabID
                                              AND l.LabDate = s.maxDate) A
    PIVOT(MIN(A.Result) FOR A.LabID IN ([1],[2],[3],[4],[5],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17])) p
    OPTION (HASH JOIN)
    

    OPTION (HASH JOIN) being the thing. The resulting execution plan for the version with TOP looks like the original non-top one, with a TOP tacked on at the end.

    Since I was originally doing this in a view what I actually ended up doing was changing JOIN to INNER HASH JOIN

    0 讨论(0)
  • 2021-01-21 18:55
    SELECT  TOP 1000
            *
    FROM    (
            SELECT  patientId, labId, result,
                    DENSE_RANK() OVER (PARTITION BY patientId, labId ORDER BY labDate DESC) dr
            FROM    labs
            ) q
    PIVOT   (
            MIN(result)
            FOR
            labId IN ([1],[2],[3],[4],[5],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17])
            ) p
    WHERE   dr = 1
    ORDER BY
            patientId
    

    You may also try creating an indexed view like this:

    CREATE VIEW
            v_labs_patient_lab
    WITH SCHEMABINDING
    AS
    SELECT  patientId, labId, COUNT_BIG(*) AS cnt
    FROM    dbo.labs
    GROUP BY
            patientId, labId
    
    CREATE UNIQUE CLUSTERED INDEX
            ux_labs_patient_lab
    ON      v_labs_patient_lab (patientId, labId)
    

    and use it in the query:

    SELECT  TOP 1000
            *
    FROM    (
            SELECT  lr.patientId, lr.labId, lr.result
            FROM    v_labs_patient_lab vl
            CROSS APPLY
                    (
                    SELECT TOP 1 WITH TIES
                           result
                    FROM   labs l
                    WHERE  l.patientId = vl.patientId
                           AND l.labId = vl.labId
                    ORDER BY
                           l.labDate DESC
                    ) lr
            ) q
    PIVOT   (
            MIN(result)
            FOR
            labId IN ([1],[2],[3],[4],[5],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17])
            ) p
    ORDER BY
            patientId
    
    0 讨论(0)
  • 2021-01-21 18:57

    There is a specific order in which queries are processed.

    A normal SQL query will be written as follows:

    SELECT [...]
      FROM [table1]
      JOIN [table2]
        ON [condition]
     WHERE [...]
     GROUP BY [...]
    HAVING [...]
     ORDER BY [...]
    

    But the processing order is different:

    FROM [table1]
        ON [condition]
      JOIN [table2]
     WHERE [...]
     GROUP BY [...]
    HAVING [...]
    SELECT [...]
     ORDER BY [...]
    

    When using SELECT DISTINCT [...] or SELECT TOP [...] the processing order will be as follows:

    FROM [table1]
        ON [condition]
      JOIN [table2]
     WHERE [...]
     GROUP BY [...]
    HAVING [...]
    SELECT [...] DISTINCT[...]
    ORDER BY [...]
    TOP [....]
    

    Hence it's taking longer as your SELECT TOP 1000 is processed last.

    Take a look at this link for further details: http://blogs.msdn.com/b/sqlqueryprocessing/

    0 讨论(0)
提交回复
热议问题