My table (SQL Server 2008) has 1 million+ records, when I try to order records by datetime, it takes 1 second, but when I order by ID (int), it only takes about 0.1 second.<
If your datetime field contains a lot of distinct values and those values rarely change, define a clustered index on the datetime field, this will sort the actual data by the datetime value. See http://msdn.microsoft.com/en-us/library/aa933131(SQL.80).aspx for using clustered indexes.
This will make you int searches slower though, as they will be relegated to using a non-clustered index.
Add the date time to a new index, adding it to the id one will still not help much.
maybe if you store datatime as a int but it would take time converting each time you store or get data. (common technique used to store staff like ip address and have a faster seek times)
you should check in your server how it stores datetime, b/c it your server already stores it as int or bigint.. it will not change anything....
Ordering by id
probably uses a clustered index scan while ordering by datetime
uses either sorting or index lookup.
Both these methods are more slow than a clustered index scan.
If your table is clustered by id
, basically it means it is already sorted. The records are contained in a B+Tree
which has a linked list linking the pages in id
order. The engine should just traverse the linked list to get the records ordered by id
.
If the id
s were inserted in sequential order, this means that the physical order of the rows will match the logical order and the clustered index scan will be yet faster.
If you want your records to be ordered by datetime
, there are two options:
datetime
. The index is stored in a separate space of the disk, this means the engine needs to shuttle between the index pages and table pages in a nested loop. It is more slow too.To improve the ordering, you can create a separate covering index on datetime
:
CREATE INDEX ix_mytable_datetime ON mytable (datetime) INCLUDE (field1, field2, …)
, and include all columns you use in your query into that index.
This index is like a shadow copy of your table but with data sorted in different order.
This will allow to get rid of the key lookups (since the index contains all data) which will make ordering by datetime
as fast as that on id
.
Update:
A fresh blog post on this problem:
To honor the ORDER BY the engine has two alternatives:
First option is fast, second is slow. The problem is that in order to be used, the index has to be a covering index. Meaning it contains all the columns in the SELECT projection list and all the columns used in WHERE clauses (at a minimum). If the index is not covering then the engine would have to lookup the clustered index (ie the 'table') for each row, in order to retrieve the values of the needed columns. This constant lookup of values is expensive, and there is a tipping point when the engine will (rightfully) decide is more efficient to just scan the clustered index and sort the result, in effect ignoring your non-clustered index. For details, see The Tipping Point Query Answers.
Consider the following three queries:
SELECT dateColumn FROM table ORDER BY dateColumn
SELECT * FROM table ORDER BY dateColumn
SELECT someColumn FROM table ORDER BY dateColumn
The first one will be be using a non-clustered index on dateColumn. But a the second one will not be using an index on dateColumn, will likely choose a scan and sort instead for 1M rows. On the other hand the third query can benefit from an index on Table(dateColumn) INCLUDE (someColumn)
.
This topic is covered at large on MSDN see Index Design Basics , General Index Design Guidelines , Nonclustered Index Design Guidelines or How To: Optimize SQL Indexes.
Ultimately, the most important choice of your table design is the clustered index you use. Almost always the primary key (usually an auto incremented ID) is left as the clustered index, a decision that benefits only certain OLTP loads.
And finally, a rather obvious question: Why in the world would you order 1 million rows?? You can't possibly display them, can you? Explaining a little bit more about your use case might help us find a better answer for you.
Have you added the DateTime field to "the" index or to an exclusive index? Are you filtering your selection by another field and the DateTime or only this one?
You must have an index with all the fields that you are filtering and preferably in the same order to optmize performance.