I know you can ALTER the column order in MySQL with FIRST and AFTER, but why would you want to bother? Since good queries explicitly name columns when inserting data, is the
Update:
In MySQL
, there may be a reason to do this.
Since variable datatypes (like VARCHAR
) are stored with variable lengths in InnoDB
, the database engine should traverse all previous columns in each row to find out the offset of the given one.
The impact may be as big as 17% for 20
columns.
See this entry in my blog for more detail:
In Oracle
, trailing NULL
columns consume no space, that's why you should always put them to the end of the table.
Also in Oracle
and in SQL Server
, in case of a large row, a ROW CHAINING
may occur.
ROW CHANING
is splitting a row that doesn't fit into one block and spanning it over the multiple blocks, connected with a linked list.
Reading trailing columns that didn't fit into the first block will require traversing the linked list, which will result in an extra I/O
operation.
See this page for illustration of ROW CHAINING
in Oracle
:
That's why you should put columns you often use to the beginning of the table, and columns you don't use often, or columns that tend to be NULL
, to the end of the table.
Important note:
If you like this answer and want to vote for it, please also vote for @Andomar's answer.
He answered the same thing, but seems to be downvoted for no reason.
As is often the case, the biggest factor is the next guy who has to work on the system. I try to have the primary key columns first, the foreign key columns second, and then the rest of the columns in descending order of importance / significance to the system.
Column order had a big performance impact on some of the databases I've tuned, spanning Sql Server, Oracle, and MySQL. This post has good rules of thumb:
An example for difference in performance is an Index lookup. The database engine finds a row based on some conditions in the index, and gets back a row address. Now say you are looking for SomeValue, and it's in this table:
SomeId int,
SomeString varchar(100),
SomeValue int
The engine has to guess where SomeValue starts, because SomeString has an unknown length. However, if you change the order to:
SomeId int,
SomeValue int,
SomeString varchar(100)
Now the engine knows that SomeValue can be found 4 bytes after the start of the row. So column order can have a considerable performance impact.
EDIT: Sql Server 2005 stores fixed-length fields at the start of the row. And each row has a reference to the start of a varchar. This completely negates the effect I've listed above. So for recent databases, column order no longer has any impact.
If you're going to be using UNION a lot, it makes matching columns easier if you have a convention about their ordering.
In 2002, Bill Thorsteinson posted on the Hewlett Packard forums his suggestions for optimizing MySQL queries by reordering the columns. His post has since been literally copied and pasted at least a hundred times on the Internet, often without citation. To quote him exactly...
General rules of thumb:
- Primary key columns first.
- Foreign key columns next.
- Frequently-searched columns next.
- Frequently-updated columns later.
- Nullable columns last.
- Least-used nullable columns after more-frequently used nullable columns.
- Blobs in own table with few other columns.
Source: HP Forums.
But that post was made all the back in 2002! This advice was for MySQL version 3.23, more than six years before MySQL 5.1 would be released. And there are no references or citations. So, was Bill right? And how exactly does the storage engine work at this level?
To quote Martin Zahn, an Oracle-certified professional, in an article on The Secrets of Oracle Row Chaining and Migration...
Chained rows affect us differently. Here, it depends on the data we need. If we had a row with two columns that was spread over two blocks, the query:
SELECT column1 FROM table
where column1 is in Block 1, would not cause any «table fetch continued row». It would not actually have to get column2, it would not follow the chained row all of the way out. On the other hand, if we ask for:
SELECT column2 FROM table
and column2 is in Block 2 due to row chaining, then you would in fact see a «table fetch continued row»
The rest of the article is a rather good read! But I am only quoting the part here that is directly relevant to our question at hand.
More than 18 years later, I gotta say it: thanks, Bill!
The only reason I can think about is for debugging and fire-fighting. We have a table whose "name" column's appears about 10th on the list. It's a pain when you do a quick select * from table where id in (1,2,3) and then you have to scroll across to look at the names.
But that's about it.