Why is SQL Server using index scan instead of index seek when WHERE clause contains parameterized values

前端 未结 4 1053
滥情空心
滥情空心 2020-12-18 05:02

We have found that SQL Server is using an index scan instead of an index seek if the where clause contains parametrized values instead of string literal.

<
相关标签:
4条回答
  • 2020-12-18 05:29

    I guess first query is using predicate and second query is using seek predicate.

    Seek Predicate is the operation that describes the b-tree portion of the Seek. Predicate is the operation that describes the additional filter using non-key columns. Based on the description, it is very clear that Seek Predicate is better than Predicate as it searches indexes whereas in Predicate, the search is on non-key columns – which implies that the search is on the data in page files itself.

    For more details please visit:- https://social.msdn.microsoft.com/Forums/sqlserver/en-US/36a176c8-005e-4a7d-afc2-68071f33987a/predicate-and-seek-predicate

    0 讨论(0)
  • 2020-12-18 05:34

    Well to answer your question why SQL Server is doing this, the answer is that the query is not compiled in a logical order, each statement is compiled on it's own merit, so when the query plan for your select statement is being generated, the optimiser does not know that @val1 and @Val2 will become 'val1' and 'val2' respectively.

    When SQL Server does not know the value, it has to make a best guess about how many times that variable will appear in the table, which can sometimes lead to sub-optimal plans. My main point is that the same query with different values can generate different plans. Imagine this simple example:

    IF OBJECT_ID(N'tempdb..#T', 'U') IS NOT NULL
        DROP TABLE #T;
    
    CREATE TABLE #T (ID INT IDENTITY PRIMARY KEY, Val INT NOT NULL, Filler CHAR(1000) NULL);
    INSERT #T (Val)
    SELECT  TOP 991 1
    FROM    sys.all_objects a
    UNION ALL
    SELECT  TOP 9 ROW_NUMBER() OVER(ORDER BY a.object_id) + 1
    FROM    sys.all_objects a;
    
    CREATE NONCLUSTERED INDEX IX_T__Val ON #T (Val);
    

    All I have done here is create a simple table, and add 1000 rows with values 1-10 for the column val, however 1 appears 991 times, and the other 9 only appear once. The premise being this query:

    SELECT  COUNT(Filler)
    FROM    #T
    WHERE   Val = 1;
    

    Would be more efficient to just scan the entire table, than use the index for a seek, then do 991 bookmark lookups to get the value for Filler, however with only 1 row the following query:

    SELECT  COUNT(Filler)
    FROM    #T
    WHERE   Val = 2;
    

    will be more efficient to do an index seek, and a single bookmark lookup to get the value for Filler (and running these two queries will ratify this)

    I am pretty certain the cut off for a seek and bookmark lookup actually varies depending on the situation, but it is fairly low. Using the example table, with a bit of trial and error, I found that I needed the Val column to have 38 rows with the value 2 before the optimiser went for a full table scan over an index seek and bookmark lookup:

    IF OBJECT_ID(N'tempdb..#T', 'U') IS NOT NULL
        DROP TABLE #T;
    
    DECLARE @I INT = 38;
    
    CREATE TABLE #T (ID INT IDENTITY PRIMARY KEY, Val INT NOT NULL, Filler CHAR(1000) NULL);
    INSERT #T (Val)
    SELECT  TOP (991 - @i) 1
    FROM    sys.all_objects a
    UNION ALL
    SELECT  TOP (@i) 2
    FROM    sys.all_objects a
    UNION ALL
    SELECT  TOP 8 ROW_NUMBER() OVER(ORDER BY a.object_id) + 2
    FROM    sys.all_objects a;
    
    CREATE NONCLUSTERED INDEX IX_T__Val ON #T (Val);
    
    SELECT  COUNT(Filler), COUNT(*)
    FROM    #T
    WHERE   Val = 2;
    

    So for this example the limit is 3.7% of matching rows.

    Since the query does not know the how many rows will match when you are using a variable it has to guess, and the simplest way is by finding out the total number rows, and dividing this by the total number of distinct values in the column, so in this example the estimated number of rows for WHERE val = @Val is 1000 / 10 = 100, The actual algorithm is more complex than this, but for example's sake this will do. So when we look at the execution plan for:

    DECLARE @i INT = 2;
    SELECT  COUNT(Filler)
    FROM    #T
    WHERE   Val = @i;
    

    enter image description here

    We can see here (with the original data) that the estimated number of rows is 100, but the actual rows is 1. From the previous steps we know that with more than 38 rows the optimiser will opt for a clustered index scan over an index seek, so since the best guess for the number of rows is higher than this, the plan for an unknown variable is a clustered index scan.

    Just to further prove the theory, if we create the table with 1000 rows of numbers 1-27 evenly distributed (so the estimated row count will be approximately 1000 / 27 = 37.037)

    IF OBJECT_ID(N'tempdb..#T', 'U') IS NOT NULL
        DROP TABLE #T;
    
    CREATE TABLE #T (ID INT IDENTITY PRIMARY KEY, Val INT NOT NULL, Filler CHAR(1000) NULL);
    INSERT #T (Val)
    SELECT  TOP 27 ROW_NUMBER() OVER(ORDER BY a.object_id)
    FROM    sys.all_objects a;
    
    INSERT #T (val)
    SELECT  TOP 973 t1.Val
    FROM    #T AS t1
            CROSS JOIN #T AS t2
            CROSS JOIN #T AS t3
    ORDER BY t2.Val, t3.Val;
    
    CREATE NONCLUSTERED INDEX IX_T__Val ON #T (Val);
    

    Then run the query again, we get a plan with an index seek:

    DECLARE @i INT = 2;
    SELECT  COUNT(Filler)
    FROM    #T
    WHERE   Val = @i;
    

    enter image description here

    So hopefully that pretty comprehensively covers why you get that plan. Now I suppose the next question is how do you force a different plan, and the answer is, to use the query hint OPTION (RECOMPILE), to force the query to compile at execution time when the value of the parameter is known. Reverting to the original data, where the best plan for Val = 2 is a lookup, but using a variable yields a plan with an index scan, we can run:

    DECLARE @i INT = 2;
    SELECT  COUNT(Filler)
    FROM    #T
    WHERE   Val = @i;
    
    GO
    
    DECLARE @i INT = 2;
    SELECT  COUNT(Filler)
    FROM    #T
    WHERE   Val = @i
    OPTION (RECOMPILE);
    

    enter image description here

    We can see that the latter uses the index seek and key lookup because it has checked the value of variable at execution time, and the most appropriate plan for that specific value is chosen. The trouble with OPTION (RECOMPILE) is that means you can't take advantage of cached query plans, so there is an additional cost of compiling the query each time.

    0 讨论(0)
  • 2020-12-18 05:38

    Try

    declare @val1 nvarchar(40), @val2 nvarchar(40);
    set @val1 = 'val1';
    set @val2 = 'val2';
    
    select 
        min(id) 
    from 
        scor_inv_binaries 
    where 
        col1 in (@val1, @val2) 
    group by 
        col1
    OPTION (RECOMPILE)
    
    0 讨论(0)
  • What datatype is col1?

    Your variables are nvarchar whereas your literals are varchar/char; if col1 is varchar/char it may be doing the index scan to implicitly cast each value in col1 to nvarchar for the comparison.

    0 讨论(0)
提交回复
热议问题