I\'m working with a client who wants to add timestamps to a bunch of tables so that they may sort the records in those tables chronologically. All of the tables also have a
Further to egrunin's answer, a change to the persistence or processing logic of these rows may cause rows to be inserted into the database in a non-sequential or nondeterministic manner. You may implement a parallelized file processor that throws a row into the DB as soon as the thread finishes transforming it, which may be before another thread has finished processing a row that occurred earlier in the file. Using an ORM for record persistence may result in a similar behavior; the ORM may just maintain a "bag" (unordered collection) of object graphs awaiting persistence, and grab them at random to persist them to the DB when it's told to "flush" its object buffer.
In either case, trusting the autoincrement column to tell you the order in which records came into the SYSTEM is bad juju. It may or may not be able to tell you the order in which records his the DATABASE; that depends on the DB implementation.
Don't do it. You should never rely on the actual value of your ID column. Treat it like a black box, only useful for doing key lookups.
You say "less storage required per record," but how important is that? How big are the rows we're talking about? If you've got 200-byte rows, another 4 bytes probably isn't going to matter much.
Don't optimize without measuring. Get it working right first, and THEN optimize.
You didnt mention if you are running on a single db or clustered. If you are clustered, be wary of increment implementations, as you are not always guaranteed things will come out in the order you would naturally think. For example, Oracle sequences can cache groups of next values (depending on your setup) and give you a 1,3,2,4,5 sort of list...
Auto-incrementing ID will give you an idea of order as Brad points out, but do it right - if you want to know WHEN something was added, have a datetime column. Then you can not only chronologically sort but also apply filters.
You can acheive the same goal in the short term by sorting on the ID column. This would be better that adding additional data to acheive the same result. I don't think that it would be confusing for anyone to look at the data table and know that it's chronological when they see that it's an identity column.
There are a few drawbacks or limitations that I see however.
Based on the realistic evaluation of these "limitations" you should be able to advise a proper approach.
You asked for "best practices", rather than "not terrible practices" so: no, you should not rely on an autoincremented primary key to establish chronology. One day you're going to introduce a change to the db design and that will break. I've seen it happen.
A datetime column whose default value is GETDATE()
has very little overhead (about as much as an integer) and (better still) tells you not just sequence but actual date and time, which often turns out to be priceless. Even maintaining an index on the column is relatively cheap.
These days, I always put a CreateDate
column data objects connected to real world events (such as account creation).
Edited to add:
If exact chronology is crucial to your application, you can't rely on either auto-increment or timestamps (since there can always be identical timestamps, no matter how high the resolution). You'll probably have to make something application-specific instead.