Why is doing a top(1) on an indexed column in SQL Server slow?

后端 未结 8 1787
离开以前
离开以前 2021-01-03 22:13

I\'m puzzled by the following. I have a DB with around 10 million rows, and (among other indices) on 1 column (campaignid_int) is an index.

Now I have 700k rows wher

相关标签:
8条回答
  • 2021-01-03 22:29

    You aren't specifying an ORDER BY clause in your query, so the optimiser is not being instructed as to the sort order it should be selecting the top 1 from. SQL Server won't just take a random row, it will order the rows by something and take the top 1, and it may be choosing to order by something that is sub-optimal. I would suggest that you add an ORDER BY x clause, where x being the clustered key on that table will probably be the fastest.

    This may not solve your problem -- in fact I'm not sure I expect it to from the statistics you've given -- but (a) it won't hurt, and (b) you'll be able to rule this out as a contributing factor.

    0 讨论(0)
  • 2021-01-03 22:38

    The index may be useless for 2 reasons:

    • 700k in 10 million may be not selective enough
    • and /or
    • connectionid needs included so the entire query can used only an index

    Otherwise, the optimiser decides it may as well use the PK/clustered index to both filter on campaignid_int and get connectionid, to avoid a bookmark lookup on 700k rows from the current index.

    So, I suggest this...

    CREATE NONCLUSTERED INDEX IX_Foo ON MyTable (campaignid_int) INCLUDE (connectionid)
    
    0 讨论(0)
  • 2021-01-03 22:45

    If the campaignid_int column is not indexed, add an index to it. That should speed up the query. Right now I presume that you need to do a full table scan to find the matches for campaignid_int = 3835 before the top(1) row is returned (filtering occurs before results are returned).

    EDIT: An index is already in place, but since SQL Server does a clustered index scan, the optimizer has ignored the index. This is probably due to (many) duplicate rows with the same campaignid_int value. You should consider indexing differently or query on a different column to get the connectionid you want.

    0 讨论(0)
  • 2021-01-03 22:45

    This doesn't answer your question, but try using:

    SET ROWCOUNT 1
    SELECT     connectionid
     FROM         outgoing_messages WITH (NOLOCK)
     WHERE     (campaignid_int = 3835)
    

    I've seen top(x) perform very badly in certain situations as well. I'm sure it's doing a full table scan. Perhaps your index on that particular column needs to be rebuilt? The above is worth a try, however.

    0 讨论(0)
  • 2021-01-03 22:46

    Due to the statistics, you should explicitly ask the optimizer to use the index you've created instead of the clustered one.

    SELECT  TOP (1) connectionid
    FROM    outgoing_messages WITH (NOLOCK, index(idx_connectionid))
    WHERE  (campaignid_int = 3835)
    

    I hope it will solve the issue.

    Regards, Enrique

    0 讨论(0)
  • 2021-01-03 22:47

    but since I'm specifying 'top(1)' it means: give me any row. Why would it first crawl through the 700k rows just to return one? – reinier 30 mins ago

    Sorry, can't comment yet but the answer here is that SQL server is not going to understand the human equivalent of "Bring me the first one you find" when it hears "Top 1". Instead of the expected "Give me any row" SQL Server goes and fetches the first of all found rows. Only time it knows that is after fetching all rows first, then discarding the rest. Very thorough but in your case not really fast.

    Main issue as other said are your statistics and selectivity of your index. If you have another unique field in your table (like an identity column) then try an combined index on campaignid_int first, unique column second. As you only query on campaignid_int it has to be the first part of the key. Sounds worth a try as this index should have a higher selectivity thus the optimizer can use this better than doing an index crawl.

    0 讨论(0)
提交回复
热议问题