Just to be clear. Presumably, you are talking about a simple query such as:
select *
from table t;
First, if all the data on the table fits on a single page and there are no other indexes on the table, it is hard for me to imagine a scenario where the result set is not ordered by the primary key. However, this is because I think the most reasonable query plan would require a full-table scan, not because of any requirement -- documented or otherwise -- in SQL or SQL Server. Without an explicit order by
, the ordering in the result set is a consequence of the query plan.
That gets to the heart of the issue. When you are talking about the ordering of the result sets, you are really talking about the query plan. And, the assumption of ordering by the primary key really means that you are assuming that the query uses full-table scan. What is ironic is that people make the assumption, without actually understanding the "why". Furthermore, people have a tendency to generalize from small examples (okay, this is part of the basis of human intelligence). Unfortunately, they see consistently that results sets from simple queries on small tables are always in primary key order and generalize to larger tables. The induction step is incorrect in this example.
What can change this? Off-hand, I think that a full table scan would return the data in primary key order if the following conditions are met:
- Single threaded server.
- Single file filegroup
- No competing indexes
- No table partitions
I'm not saying this is always true. It just seems reasonable that under these circumstances such a query would use a full table scan starting at the beginning of the table.
Even on a small table, you can get surprises. Consider:
select NonPrimaryKeyColumn
from table
The query plan would probably decide to use an index on table(NonPrimaryKeyColumn)
rather than doing a full table scan. The results would not be ordered by the primary key (unless by accident). I show this example because indexes can be used for a variety of purposes, not just order by
or where
filtering.
If you use a multi-threaded instance of the database and you have reasonably sized tables, you will quickly learn that results without an order by
have no explicit ordering.
And finally, SQL Server has a pretty smart optimizer. I think there is some reluctance to use order by
in a query because users think it will automatically do a sort. SQL Server works hard to find the best execution plan for the query. IF it recognizes that the order by
is redundant because of the rest of the plan, then the order by
will not result in a sort.
And, of course you want to guarantee the ordering of results, you need order by
in the outermost query. Even a query like this:
select *
from (select top 100 t.* from t order by col1) t
Does not guarantee that the results are ordered in the final result set. You really need to do:
select *
from (select top 100 t.* from t order by col1) t
order by col1;
to guarantee the results in a particular order. This behavior is documented here.