Why is this an Index Scan and not a Index Seek?

后端未结

关注

 5  996

Here\'s the query:

SELECT      top 100 a.LocationId, b.SearchQuery, b.SearchRank
FROM        dbo.Locations a
INNER JOIN  dbo.LocationCache b ON a.LocationId


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  生来不讨喜        
                
              
                            
                2020-12-24 15:18
              
            
            
                                                                       
I did a quick test and came up with the following

CREATE TABLE #Locations
(LocationID INT NOT NULL ,
CountryID INT NOT NULL ,
[Type] INT NOT NULL 
CONSTRAINT PK_Locations
        PRIMARY KEY CLUSTERED ( LocationID ASC )
)

CREATE NONCLUSTERED INDEX [LocationsIndex01] ON #Locations
(
    CountryID ASC,
    [Type] ASC
)

CREATE TABLE #LocationCache
(LocationID INT NOT NULL ,
SearchQuery VARCHAR(50) NULL ,
SearchRank INT NOT NULL 
CONSTRAINT PK_LocationCache
        PRIMARY KEY CLUSTERED ( LocationID ASC )

)

CREATE NONCLUSTERED INDEX [LocationCacheIndex01] ON #LocationCache
(
    LocationID ASC,
    SearchQuery ASC,
    SearchRank ASC
)

INSERT INTO #Locations
SELECT 1,1,1 UNION
SELECT 2,1,4 UNION
SELECT 3,2,7 UNION
SELECT 4,2,7 UNION
SELECT 5,1,1 UNION
SELECT 6,1,4 UNION
SELECT 7,2,7 UNION
SELECT 8,2,7 --UNION

INSERT INTO #LocationCache
SELECT 4,'BlahA',10 UNION
SELECT 3,'BlahB',9 UNION
SELECT 2,'BlahC',8 UNION
SELECT 1,'BlahD',7 UNION
SELECT 8,'BlahE',6 UNION
SELECT 7,'BlahF',5 UNION
SELECT 6,'BlahG',4 UNION
SELECT 5,'BlahH',3 --UNION

SELECT * FROM #Locations
SELECT * FROM #LocationCache

SELECT      top 3 a.LocationId, b.SearchQuery, b.SearchRank
FROM        #Locations a
INNER JOIN  #LocationCache b ON a.LocationId = b.LocationId
WHERE       a.CountryId = 2
AND         a.[Type] = 7

DROP TABLE #Locations
DROP TABLE #LocationCache


For me, the query plan shows to seeks with a nested loop inner join. If you run this, do you get both seeks? If you do, then do a test on your system and create a copy of your Locations and LocationCache table and call them say Locations2 and LocationCache2 with all the indexes and copy your data into them. Then try your query hitting the new tables?
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  清酒与你        
                
              
                            
                2020-12-24 15:26
              
            
            
                                                                       
In Short: You do not have filter on LocationCache, the whole table content should be returned. You have a fully covering index. Index SCAN (once) is the cheapest operation, and query optimizer picks it. 

To optimize:
You are joining the whole tables, and later get only top 100 results. I dunno how big are they, but try to subquery the [Locations] table CountryId, Type and then join just the result with [LocationCache]. Will be waaaay faster if you have more than 1000 rows there.
Also, try adding some more restrictive filters before joins if possible. 

Index Scan:
Since a scan touches every row in the table whether or not it qualifies, the cost is proportional to the total number of rows in the table. Thus, a scan is an efficient strategy if the table is small or if most of the rows qualify for the predicate.

Index Seek:
Since a seek only touches rows that qualify and pages that contain these qualifying rows, the cost is proportional to the number of qualifying rows and pages rather than to the total number of rows in the table. 

If there is an index on a table, and if the query is touching a larger amount of data, which means the query is retrieving more than 50 percent or 90 percent of the data, and then optimizer would just scan all the data pages to retrieve the data rows. 

source
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一生所求        
                
              
                            
                2020-12-24 15:30
              
            
            
                                                                       
Whilst bearing in mind that it will result in a query that may perform badly as and when additional changes are made to it, using an INNER LOOP JOIN should force the covering index to be used on dbo.LocationCache.

SELECT      top 100 a.LocationId, b.SearchQuery, b.SearchRank
FROM        dbo.Locations a
INNER LOOP JOIN dbo.LocationCache b ON a.LocationId = b.LocationId
WHERE       a.CountryId = 2
AND         a.Type = 7

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  暖寄归人        
                
              
                            
                2020-12-24 15:34
              
            
            
                                                                       
It is using an Index Scan primarily because it is also using a Merge Join.  The Merge Join operator requires two input streams that are both sorted in an order that is compatible with the Join conditions.  

And it is using the Merge Join operator to realize your INNER JOIN because it believes that that will be faster than the more typical Nested Loop Join operator.  And it is probably right (it usually is), by using the two indexes it has chosen, it has input streams that are both pre-sorted according your join condition (LocationID).  When the input streams are pre-sorted like this, then Merge Joins are almost always faster than the other two (Loop and Hash Joins).

The downside is what you have noticed: it appears to be scanning the whole index in, so how can that be faster if it is reading so many records that may never be used?  The answer is that Scans (because of their sequential nature) can read anywhere from 10 to 100 times as many records/second as seeks.

Now Seeks usually win because they are selective: they only get the rows that you ask for, whereas Scans are non-selective: they must return every row in the range.  But because Scans have a much higher read rate, they can frequently beat Seeks as long as the ratio of Discarded Rows to Matching Rows is lower than the ratio of Scan rows/sec VS. Seek rows/sec.

Questions?



OK, I have been asked to explain the last sentence more:

A "Discarded Row" is one that the the Scan reads (because it has to read everything in the index), but that will be rejected by the Merge Join operator, because it does not have a match on the other side, possibly because the WHERE clause condition has already excluded it.  

"Matching Rows" are the ones that it read that are actually matched to something in the Merge Join.  These are the same rows that would have been read by a Seek if the Scan were replaced by a Seek.

You can figure out what there are by looking at the statistics in the Query Plan.  See that huge fat arrow to the left of the Index Scan?  That represents how many rows the optimizer thinks that it will read with the Scan.  The statistics box of the Index Scan that you posted shows the Actual Rows returned is about 5.4M (5,394,402).  This is equal to:

TotalScanRows = (MatchingRows + DiscardedRows)


(In my terms, anyway).  To get the Matching Rows, look at the "Actual Rows" reported by the Merge Join operator (you may have to take off the TOP 100 to get this accurately).  Once you know this, you can get the Discarded rows by:

DiscardedRows = (TotalScanRows - MatchingRows)


And now you can calculate the ratio.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  北海茫月        
                
              
                            
                2020-12-24 15:34
              
            
            
                                                                       
Have you tried to update your statistics?

UPDATE STATISTICS dbo.LocationCache


Here are a couple of good references on what that does and why the query optimizer will choose a scan over a seek.

http://social.msdn.microsoft.com/Forums/en-CA/sqldatabaseengine/thread/82f49db8-0c77-4bce-b26c-1ad0a4af693b

Summary


  There are several things to take into
  consideration here. Firstly, when SQL
  decides upon the best (good enough)
  plan to use, it looks at the query,
  and then also looks at the statistics
  that it stores about the tables
  involved.
  
  It then decides if it is more
  efficient to seek down the index, or
  scan the whole leaf level of the index
  (in this case, it involves touching
  every page in the table, because it is
  a clustered index) It does this by
  looking at a number of things.
  Firstly, it guesses how many
  rows/pages it will need to scan. This
  is called the tipping point, and is a
  lower percentage than you may think.
  See this great Kimberly Tripp blog
  http://www.sqlskills.com/BLOGS/KIMBERLY/category/The-Tipping-Point.aspx
  
  If you are within the limits for the
  tipping point, it may be because your
  statistics are out of date, or your
  index is heavily fragmented.
  
  It is possible to force SQL to seek an
  index by using the FORCESEEK query
  hint, but please use this with
  caution, as generally, providing you
  keep everything weel maintained, SQL
  is pretty good at deciding what the
  most efficient plan will be!!

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复