Query Index for Dictionary Based Queries

后端未结

关注

 3  678

What would be the most efficient way to query and index for the following:

SELECT * Persons.LastName A-D
SELECT * Persons.LastName E-L
SELECT * Persons.LastN


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  醉梦人生        
                
              
                            
                2020-12-21 18:48
              
            
            
                                                                       
You predicate is sargable.

If you run this query on an indexed field:

SELECT  *
FROM    persons
WHERE   last_name >= 'a'
        AND last_name < 'e'


it produces the following plan:

  |--Nested Loops(Inner Join, OUTER REFERENCES:([MYDB].[dbo].[PERSONS].[ID]) OPTIMIZED)
       |--Index Seek(OBJECT:([MYDB].[dbo].[PERSONS].[IX_PERSONS_LAST_NAME]), SEEK:([MYDB].[dbo].[PERSONS].[LAST_NAME] >= 'a' AND [MYDB].[dbo].[PERSONS].[LAST_NAME] < 'E'),  WHERE:([MYDB].[dbo].[PERSONS].[LAST_NAME] like '[a-d]%') ORDERED FORWARD)
       |--Clustered Index Seek(OBJECT:([MYDB].[dbo].[PERSONS].[IX_PERSONS_LAST_NAME]), SEEK:([MYDB].[dbo].[PERSONS].[ID]=[MYDB].[dbo].[PERSONS].[ID]) LOOKUP ORDERED FORWARD)


which is equivalent to running this query:

SELECT  *
FROM    persons
WHERE   last_name >= 'a'
        AND last_name < 'e'

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  滥情空心        
                
              
                            
                2020-12-21 18:56
              
            
            
                                                                       
As Sam says, LIKE '[a-d]%' is SARGable (well almost). Almost because of a not optimized Predicate (see below for more info).

Example #1: if you run this query in AdventureWorks2008R2 database

SET STATISTICS IO ON;
SET NOCOUNT ON;

PRINT 'Example #1:';
SELECT  p.BusinessEntityID, p.LastName
FROM    Person.Person p
WHERE   p.LastName LIKE '[a-a]%'


then, you will get an execution plan based on Index Seek operator (optimized predicate: green rectangle, non-optimized predicate:red rectangle): 

The output for SET STATISTICS IO is 

Example #1:
Table 'Person'. Scan count 1, logical reads 7


This means the server have to read 7 pages from buffer pool. Also, in this case, the index IX_Person_LastName_FirstName_MiddleName includes all columns required by SELECT, FROMand WHERE clauses: LastName and BusinessEntityID. If table has a clustered index then all non clustered indices will include the columns from clustered index key (BusinessEntityID is the key for PK_Person_BusinessEntityID clustered index).

But:

1) Your query have to show all columns because of SELECT * (it's a bad practice): BusinessEntityID, LastName, FirstName, MiddleName, PersonType, ..., ModifiedDate. 

2) The index (IX_Person_LastName_FirstName_MiddleName in previous example) doesn't includes all required columns. This is the reason why, for this query, this index is a non-covering index.

Now, if you execute the next queries then you will get diff. [actual] execution plans (SSMS, Ctrl + M):

SET STATISTICS IO ON;
SET NOCOUNT ON;

PRINT 'Example #2:';
SELECT  p.*
FROM    Person.Person p
WHERE   p.LastName LIKE '[a-a]%';
PRINT @@ROWCOUNT;

PRINT 'Example #3:';
SELECT  p.*
FROM    Person.Person p
WHERE   p.LastName LIKE '[a-z]%';
PRINT @@ROWCOUNT;

PRINT 'Example #4:';
SELECT  p.*
FROM    Person.Person p WITH(FORCESEEK)
WHERE   p.LastName LIKE '[a-z]%';
PRINT @@ROWCOUNT;


Results:

Example #2:
Table 'Person'. Scan count 1, logical reads 2805, lob logical reads 0
911

Example #3:
Table 'Person'. Scan count 1, logical reads 3817, lob logical reads 0 
19972

Example #4:
Table 'Person'. Scan count 1, logical reads 61278, lob logical reads 0
19972


Execution plans:


Plus: the query will give you the number of pages for every index created on  'Person.Person':

SELECT i.name, i.type_desc,f.alloc_unit_type_desc, f.page_count, f.index_level FROM sys.dm_db_index_physical_stats(
    DB_ID(), OBJECT_ID('Person.Person'), 
    DEFAULT, DEFAULT, 'DETAILED' ) f 
INNER JOIN sys.indexes i ON f.object_id = i.object_id AND f.index_id = i.index_id
ORDER BY i.type


name                                    type_desc    alloc_unit_type_desc page_count index_level
--------------------------------------- ------------ -------------------- ---------- -----------
PK_Person_BusinessEntityID              CLUSTERED    IN_ROW_DATA          3808       0
PK_Person_BusinessEntityID              CLUSTERED    IN_ROW_DATA          7          1
PK_Person_BusinessEntityID              CLUSTERED    IN_ROW_DATA          1          2
PK_Person_BusinessEntityID              CLUSTERED    ROW_OVERFLOW_DATA    1          0
PK_Person_BusinessEntityID              CLUSTERED    LOB_DATA             1          0
IX_Person_LastName_FirstName_MiddleName NONCLUSTERED IN_ROW_DATA          103        0
IX_Person_LastName_FirstName_MiddleName NONCLUSTERED IN_ROW_DATA          1          1

...


Now, if you compare Example #1 and Example #2 (both returns 911 rows)

`SELECT p.BusinessEntityID, p.LastName ... p.LastName LIKE '[a-a]%'`
vs.
`SELECT * ... p.LastName LIKE '[a-a]%'`


then you will see two diff.: 

a) 7 logical reads vs. 2805 logical reads and

b) Index Seek (#1) vs. Index Seek + Key Lookup (#2). 

You can see that the performance for SELECT * (#2) query is far worst (7 pages vs. 2805 pages).

Now, if you compare Example #3 and Example #4 (both returns 19972 rows)

`SELECT * ... LIKE '[a-z]%`
vs.
`SELECT * ... WITH(FORCESEEK) LIKE '[a-z]%`


then you will see two diff.: 

a) 3817 logical reads (#3) vs. 61278 logical reads (#4) and

b) Clustered Index Scan (PK_Person_BusinessEntityID has 3808 + 7 + 1 + 1 + 1 = 3818 pages) vs. Index Seek + Key Lookup. 

You can see that the performance for Index Seek + Key Lookup (#4)   query is far worst (3817 pages vs. 61278 pages). In this case, you can see that and Index Seek on IX_Person_LastName_FirstName_MiddleName plus a Key Lookup on PK_Person_BusinessEntityID (clustered index) will give you a lower performance than a 'Clustered Index Scan'. 

And all these bad execution plans are possible because of SELECT *.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  再見小時候        
                
              
                            
                2020-12-21 18:57
              
            
            
                                                                       
I'd look at your explain plan and turn on STATISTICS IO and STATISTICS time to see if anything jumps out at you. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复