I have only used SQL rarely until recently when I began using it daily. I notice that if no \"order by\" clause is used:
For PostgreSQL, if you omit the ORDER BY
clause you could run the exact same query 100 times while the database is not being modified, and get one run in the middle in a different order than the others. In fact, each run could be in a different order.
One reason this could happen is that if the plan chosen involves a sequential scan of a table's heap, and there is already a seqscan of that table's heap in process, your query will start it's scan at whatever point the other scan is already at, to reduce the need for disk access.
As other answers have pointed out, if you want the data in a certain order, specify that order. PostgreSQL will take the requested order into consideration in choosing a plan, and may use an index that provides data in that order, if that works out to be cheaper than getting the rows some other way and then sorting them.
GROUP BY
provides no guarantee of order; PostgreSQL might sort the data to do the grouping, or it might use a hash table and return the rows in order of the number generated by the hashing algorithm (i.e., pretty random). And that might change from one run to the next.
Section 20.2 <direct select statement: multiple rows>, subsection "General Rules" of the SQL-92 specification:
4) If an <order by clause> is not specified, then the ordering of the rows of Q is implementation-dependent.
It never ceased to amaze me when I was a DBA that this feature of SQL was so often thought of as quirky. Consider a simple program that runs against a text file and produces some output. If the program never changes, and the data never changes, you'd expect the output to never change.
As for this:
If no ORDER BY clause is included in the query, the returned order of rows is undefined.
Not strictly true - on every RDBMS I've ever worked on (Oracle, Informix, SQL Server, DB2 to name a few) a DISTINCT clause also has the same effect as an ORDER BY as finding unique values involves a sort by definition.
EDIT (6/2/14):
Create a simple table
For DISTINCT and ORDER BY, both the plan and the cost is the same since it is ostensibly the same operation to be performed
And not surprisingly, the effect is thus the same
If you want order, include an ORDER BY
. If you don't include an ORDER BY
, you're telling SQL Server:
I don't care what order you return the rows, just return the rows
Since you don't care, SQL Server is going to decide how to return the rows what it deems will be the most efficient manner possible right now (or according to the last time the plan for this specific query was cached). Therefore you should not rely on the behavior you observe. It can change from one run of the query to the next, with data changes, statistics changes, index changes, service packs, cumulative updates, upgrades, etc. etc. etc.
If no ORDER BY clause is included in the query, the returned order of rows is undefined.
Whilst some RDBMSes will return rows in specific orders in some situations even when an ORDER BY clause is omitted, such behaviour should never be relied upon.