What would be the most efficient way to query and index for the following:
SELECT * Persons.LastName A-D
SELECT * Persons.LastName E-L
SELECT * Persons.LastN
You predicate is sargable.
If you run this query on an indexed field:
SELECT *
FROM persons
WHERE last_name >= 'a'
AND last_name < 'e'
it produces the following plan:
|--Nested Loops(Inner Join, OUTER REFERENCES:([MYDB].[dbo].[PERSONS].[ID]) OPTIMIZED)
|--Index Seek(OBJECT:([MYDB].[dbo].[PERSONS].[IX_PERSONS_LAST_NAME]), SEEK:([MYDB].[dbo].[PERSONS].[LAST_NAME] >= 'a' AND [MYDB].[dbo].[PERSONS].[LAST_NAME] < 'E'), WHERE:([MYDB].[dbo].[PERSONS].[LAST_NAME] like '[a-d]%') ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([MYDB].[dbo].[PERSONS].[IX_PERSONS_LAST_NAME]), SEEK:([MYDB].[dbo].[PERSONS].[ID]=[MYDB].[dbo].[PERSONS].[ID]) LOOKUP ORDERED FORWARD)
which is equivalent to running this query:
SELECT *
FROM persons
WHERE last_name >= 'a'
AND last_name < 'e'
As Sam says, LIKE '[a-d]%'
is SARGable (well almost). Almost because of a not optimized Predicate
(see below for more info).
Example #1: if you run this query in AdventureWorks2008R2 database
SET STATISTICS IO ON;
SET NOCOUNT ON;
PRINT 'Example #1:';
SELECT p.BusinessEntityID, p.LastName
FROM Person.Person p
WHERE p.LastName LIKE '[a-a]%'
then, you will get an execution plan based on Index Seek
operator (optimized predicate: green rectangle, non-optimized predicate:red rectangle):
The output for SET STATISTICS IO
is
Example #1:
Table 'Person'. Scan count 1, logical reads 7
This means the server have to read 7 pages from buffer pool. Also, in this case, the index IX_Person_LastName_FirstName_MiddleName
includes all columns required by SELECT
, FROM
and WHERE
clauses: LastName and BusinessEntityID. If table has a clustered index then all non clustered indices will include the columns from clustered index key (BusinessEntityID is the key for PK_Person_BusinessEntityID clustered index).
But:
1) Your query have to show all columns because of SELECT *
(it's a bad practice): BusinessEntityID, LastName, FirstName, MiddleName, PersonType, ..., ModifiedDate.
2) The index (IX_Person_LastName_FirstName_MiddleName
in previous example) doesn't includes all required columns. This is the reason why, for this query, this index is a non-covering index.
Now, if you execute the next queries then you will get diff. [actual] execution plans (SSMS, Ctrl + M):
SET STATISTICS IO ON;
SET NOCOUNT ON;
PRINT 'Example #2:';
SELECT p.*
FROM Person.Person p
WHERE p.LastName LIKE '[a-a]%';
PRINT @@ROWCOUNT;
PRINT 'Example #3:';
SELECT p.*
FROM Person.Person p
WHERE p.LastName LIKE '[a-z]%';
PRINT @@ROWCOUNT;
PRINT 'Example #4:';
SELECT p.*
FROM Person.Person p WITH(FORCESEEK)
WHERE p.LastName LIKE '[a-z]%';
PRINT @@ROWCOUNT;
Results:
Example #2:
Table 'Person'. Scan count 1, logical reads 2805, lob logical reads 0
911
Example #3:
Table 'Person'. Scan count 1, logical reads 3817, lob logical reads 0
19972
Example #4:
Table 'Person'. Scan count 1, logical reads 61278, lob logical reads 0
19972
Execution plans:
Plus: the query will give you the number of pages for every index created on 'Person.Person':
SELECT i.name, i.type_desc,f.alloc_unit_type_desc, f.page_count, f.index_level FROM sys.dm_db_index_physical_stats(
DB_ID(), OBJECT_ID('Person.Person'),
DEFAULT, DEFAULT, 'DETAILED' ) f
INNER JOIN sys.indexes i ON f.object_id = i.object_id AND f.index_id = i.index_id
ORDER BY i.type
name type_desc alloc_unit_type_desc page_count index_level
--------------------------------------- ------------ -------------------- ---------- -----------
PK_Person_BusinessEntityID CLUSTERED IN_ROW_DATA 3808 0
PK_Person_BusinessEntityID CLUSTERED IN_ROW_DATA 7 1
PK_Person_BusinessEntityID CLUSTERED IN_ROW_DATA 1 2
PK_Person_BusinessEntityID CLUSTERED ROW_OVERFLOW_DATA 1 0
PK_Person_BusinessEntityID CLUSTERED LOB_DATA 1 0
IX_Person_LastName_FirstName_MiddleName NONCLUSTERED IN_ROW_DATA 103 0
IX_Person_LastName_FirstName_MiddleName NONCLUSTERED IN_ROW_DATA 1 1
...
Now, if you compare Example #1
and Example #2
(both returns 911 rows)
`SELECT p.BusinessEntityID, p.LastName ... p.LastName LIKE '[a-a]%'`
vs.
`SELECT * ... p.LastName LIKE '[a-a]%'`
then you will see two diff.:
a) 7 logical reads vs. 2805 logical reads and
b) Index Seek
(#1) vs. Index Seek
+ Key Lookup
(#2).
You can see that the performance for SELECT *
(#2) query is far worst (7 pages vs. 2805 pages).
Now, if you compare Example #3
and Example #4
(both returns 19972 rows)
`SELECT * ... LIKE '[a-z]%`
vs.
`SELECT * ... WITH(FORCESEEK) LIKE '[a-z]%`
then you will see two diff.:
a) 3817 logical reads (#3) vs. 61278 logical reads (#4) and
b) Clustered Index Scan
(PK_Person_BusinessEntityID has 3808 + 7 + 1 + 1 + 1 = 3818 pages) vs. Index Seek
+ Key Lookup
.
You can see that the performance for Index Seek
+ Key Lookup
(#4) query is far worst (3817 pages vs. 61278 pages). In this case, you can see that and Index Seek
on IX_Person_LastName_FirstName_MiddleName
plus a Key Lookup
on PK_Person_BusinessEntityID
(clustered index) will give you a lower performance than a 'Clustered Index Scan'.
And all these bad execution plans are possible because of SELECT *
.
I'd look at your explain plan and turn on STATISTICS IO and STATISTICS time to see if anything jumps out at you.