Is a table intrinsically sorted by it\'s primary key? If I have a table with the primary key on a BigInt identity column can I trust that queries will always return the dat
Without an explicit ORDER BY, there is no default sort order. A very common question. As such, there is a canned answer:
Without ORDER BY, there is no default sort order.
Can you elaborate why "The performance difference is significant."?
This may be implementation-specific, but MySQL seems to sort by the primary key by default. However, any time where you need a guarantee that rows will be ordered a certain way, you should add ORDER BY.
You must apply the ORDER BY
to guarantee an order. If you are noticing a performance difference than it is likely your data was not sorted without the ORDER BY
in place — otherwise SQL-Server must be behaving badly since it is not realizing the data is already sorted. Adding the ORDER BY
on already sorted data should not incur a performance penalty since the RDBMS should be smart enough to realize the order of the data.
A table by default is not 'clustered' , i.e. organized by PK. You do have the option of specifying it as such. So the default is "HEAP" (in no particular order), and the option you are looking for is "CLUSTERED" (SQL Server, in Oracle its called IOT).
The earlier poster is correct, SQL (and the theoretical basis of it) specifically defines a select as an unordered set/tuple.
SQL usually tries to stay in the logical-realm and not make assumptions about the physical organization / locations etc. of the data. The CLUSTERED option allows us to do that for practical real-life situations.
In SQL Server: no, by it's clustering key - which default to the primary key, but doesn't have to be the same.
The primary key's main function is to uniquely identify each row in the table - but it doesn't imply any (physical) sorting per se.
Not sure about the other database systems.
Marc
Data is physically stored by clustered index, which is usually the primary key but doesn't have to be.
Data in SQL is not guaranteed to have order without an ORDER BY clause. You should always specify an ORDER BY clause when you need the data to be in a particular order. If the table is already sorted that way, the optimizer won't do any extra work, so there's no harm in having it there.
Without an ORDER BY clause, the RDBMS might return cached pages matching your query while it waits for records to be read in from disk. In that case, even if there is an index on the table, data might not come in in the index's order. (Note this is just an example - I don't know or even think that a real-world RDBMS will do this, but it's acceptable behaviour for an SQL implementation.)
EDIT
If you have a performance impact when sorting versus when not sorting, you're probably sorting on a column (or set of columns) that doesn't have an index (clustered or otherwise). Given that it's a time series, you might be sorting based on time, but the clustered index is on the primary bigint. SQL Server doesn't know that both increase the same way, so it has to resort everything.
If the time column and the primary key column are a related by order (one increases if and only if the other increases or stays the same), sort by the primary key instead. If they aren't related this way, move the clustered index from the primary key to whatever column(s) you're sorting by.