I want to share my experience with you on this endless war :D on natural vs surrogate key dilemma. I think that both surrogate keys (artificial auto-generated ones) and natural keys (composed of column(s) with domain meaning) have pros and cons. So depending on your situation, it might be more relevant to choose one method or the other.
As it seems that many people present surrogate keys as the almost perfect solution and natural keys as the plague, I will focus on the other point of view's arguments:
Disadvantages of surrogate keys
Surrogate keys are:
- Source of performance problems:
- They are usually implemented using auto-incremented columns which mean:
- A round-trip to the database each time you want to get a new Id (I know that this can be improved using caching or [seq]hilo alike algorithms but still those methods have their own drawbacks).
- If one-day you need to move your data from one schema to another (It happens quite regularly in my company at least) then you might encounter Id collision problems. And Yes I know that you can use UUIDs but those lasts requires 32 hexadecimal digits! (If you care about database size then it can be an issue).
- If you are using one sequence for all your surrogate keys then - for sure - you will end up with contention on your database.
- Error prone. A sequence has a max_value limit so - as a developer - you have to put attention to the following points:
- You must cycle your sequence ( when the max-value is reached it goes back to 1,2,...).
- If you are using the sequence as an ordering (over time) of your data then you must handle the case of cycling (column with Id 1 might be newer than row with Id max-value - 1).
- Make sure that your code (and even your client interfaces which should not happen as it supposed to be an internal Id) supports 32b/64b integers that you used to store your sequence values.
- They don't guarantee non duplicated data. You can always have 2 rows with all the same column values but with a different generated value. For me this is THE problem of surrogate keys from a database design point of view.
- More in Wikipedia...
Myths on natural keys
- Composite keys are less inefficient than surrogate keys. No! It depends on the used database engine:
- Natural keys don't exist in real-life. Sorry but they do exist! In aviation industry, for example, the following tuple will be always unique regarding a given scheduled flight (airline, departureDate, flightNumber, operationalSuffix). More generally, when a set of business data is guaranteed to be unique by a given standard then this set of data is a [good] natural key candidate.
- Natural keys "pollute the schema" of child tables. For me this is more a feeling than a real problem. Having a 4 columns primary-key of 2 bytes each might be more efficient than a single column of 11 bytes. Besides, the 4 columns can be used to query the child table directly (by using the 4 columns in a where clause) without joining to the parent table.
Conclusion
Use natural keys when it is relevant to do so and use surrogate keys when it is better to use them.
Hope that this helped someone!