So I\'ve seen several mentions of a surrogate key lately, and I\'m not really sure what it is and how it differs from a primary key.
I always assumed that ID was my prim
The reason that database purists get all up in arms about surrogate keys is because, if used improperly, they can allow data duplication, which is one of the evils that good database design is meant to banish.
For instance, suppose that I had a table of email addresses for a mailing list. I would want them to be unique, right? There's no point in having 2, 3, or n entries of the same email address. If I use email_address
as my primary key ( which is a natural key -- it exists as data independently of the database structure you've created ), this will guarantee that I will never have a duplicate email address in my mailing list.
However, if I have a field called id
as a surrogate key, then I can have any number of duplicate email addresses. This becomes bad if there are then 10 rows of the same email address, all with conflicting subscription information in other columns. Which one is correct, if any? There's no way to tell! After that point, your data integrity is borked. There's no way to fix the data but to go through the records one by one, asking people what subscription information is really correct, etc.
The reason why non-purists want it is because it makes it easy to use standardized code, because you can rely on refering to a single database row with an integer value. If you had a natural key of, say, the set ( client_id, email, category_id )
, the programmer is going to hate coding around this instance! It kind of breaks the encapsulation of class-based coding, because it requires the programmer to have deep knowledge of table structure, and a delete method may have different code for each table. Yuck!
So obviously this example is over-simplified, but it illustrates the point.