SQL why is SELECT COUNT(*) , MIN(col), MAX(col) faster then SELECT MIN(col), MAX(col)

前端 未结 1 1639
青春惊慌失措
青春惊慌失措 2020-12-09 04:30

We\'re seeing a huge difference between these queries.

The slow query

SELECT MIN(col) AS Firstdate, MAX(col) AS Lastdate 
F         


        
相关标签:
1条回答
  • 2020-12-09 05:04

    The SQL Server cardinality estimator makes various modelling assumptions such as

    • Independence: Data distributions on different columns are independent unless correlation information is available.
    • Uniformity: Within each statistics object histogram step, distinct values are evenly spread and each value has the same frequency.

    Source

    There are 810,064 rows in the table.

    You have the query

    SELECT COUNT(*),
           MIN(startdate) AS Firstdate,
           MAX(startdate) AS Lastdate
    FROM   table
    WHERE  status <> 'A'
           AND fk = 4193 
    

    1,893 (0.23%) rows meet the fk = 4193 predicate, and of those two fail the status <> 'A' part so overall 1,891 match and need to be aggregated.

    You also have two indexes neither of which cover the whole query.

    For your fast query it uses an index on fk to directly find rows where fk = 4193 then needs to do 1,893 key lookups to find each row in the clustered index to check the status predicate and retrieve the startdate for aggregation.

    When you remove the COUNT(*) from the SELECT list SQL Server no longer has to process every qualifying row. As a result it considers another option.

    You have an index on startdate so it could start scanning that from the beginning, doing key lookups back to the base table and as soon as it finds the first matching row stop as it has found the MIN(startdate), Similarly the MAX can be found with another scan starting the other end of the index and working backwards.

    SQL Server estimates that each of these scans will end up processing 590 rows before they hit upon one that matches the predicate. Giving 1,180 total lookups vs 1,893 so it chooses this plan.

    The 590 figure is just table_size / estimated_number_of_rows_that_match. i.e. the cardinality estimator assumes that the matching rows will be evenly distributed throughout the table.

    Unfortunately the 1,891 rows that meet the predicate are not randomly distributed with respect to startdate. In fact they are all condensed into a single 8,205 row segment towards the end of the index meaning that the scan to get to the MIN(startdate) ends up doing 801,859 key lookups before it can stop.

    This can be reproduced below.

    CREATE TABLE T
    (
    id int identity(1,1) primary key,
    startdate datetime,
    fk int,
    [status] char(1),
    Filler char(2000)
    )
    
    CREATE NONCLUSTERED INDEX ix ON T(startdate)
    
    INSERT INTO T
    SELECT TOP 810064 Getdate() - 1,
                      4192,
                      'B',
                      ''
    FROM   sys.all_columns c1,
           sys.all_columns c2  
    
    
    UPDATE T 
    SET fk = 4193, startdate = GETDATE()
    WHERE id BETWEEN 801859 and 803748 or id = 810064
    
    UPDATE T 
    SET  startdate = GETDATE() + 1
    WHERE id > 810064
    
    
    /*Both queries give the same plan. 
    UPDATE STATISTICS T WITH FULLSCAN
    makes no difference*/
    
    SELECT MIN(startdate) AS Firstdate, 
           MAX(startdate) AS Lastdate 
    FROM T
    WHERE status <> 'A' AND fk = 4192
    
    
    SELECT MIN(startdate) AS Firstdate, 
           MAX(startdate) AS Lastdate 
    FROM T
    WHERE status <> 'A' AND fk = 4193
    

    You could consider using query hints to force the plan to use the index on fk rather than startdate or add the suggested missing index highlighted in the execution plan on (fk,status) INCLUDE (startdate) to avoid this issue.

    0 讨论(0)
提交回复
热议问题