Database “key/ID” design ideas, Surrogate Key, Primary Key, etc

前端 未结 7 1870
北荒
北荒 2021-02-09 16:15

So I\'ve seen several mentions of a surrogate key lately, and I\'m not really sure what it is and how it differs from a primary key.

I always assumed that ID was my prim

相关标签:
7条回答
  • 2021-02-09 16:32

    First, a Surrogate key is a key that is artificially generated within the database, as a unique value for each row in a table, and which has no dependency whatsoever on any other attribute in the table.

    Now, the phrase Primary Key is a red herring. Whether a key is primary or an alternate doesn't mean anything. What matters is what the key is used for. Keys can serve two functions which are fundementally inconsistent with one another.

    1. They are first and foremost there to ensure the integrity and consistency of your data! Each row in a table represents an instance of whatever entity that table is defined to hold data for. No Surrogate Key, by definition, can ever perform this function. Only a properly designed natural Key can do this. (If all you have is a surrogate key, you can always add another row with every other attributes exactly identical to an existing row, as long as you give it a different surrogate key value)
    2. Secondly they are there to serve as references (pointers) for the foreign Keys in other tables which are children entities of an entity in the table with the Primary Key. A Natural Key, (especially if it is a composite of multiple attributes) is not a good choice for this function because it would mean tha that A) the foreign keys in all the child tables would also have to be composite keys, making them very wide, and thereby decreasing performance of all constraint operations and of SQL Joins. and B) If the value of the key changed in the main table, you would be required to do cascading updates on every table where the value was represented as a FK.

    So the answer is simple... Always (wherever you care about data integrity/consistency) use a natural key and, where necessary, use both! When the natural key is a composite, or long, or not stable enough, add an alternate Surrogate key (as auto-incrementing integer for example) for use as targets of FKs in child tables. But at the risk of losing data consistency of your table, DO NOT remove the natural key from the main table.

    To make this crystal clear let's make an example. Say you have a table with Bank accounts in it... A natural Key might be the Bank Routing Number and the Account Number at the bank. To avoid using this twin composite key in every transaction record in the transactions table you might decide to put an artificially generated surrogate key on the BankAccount table which is just an integer. But you better keep the natural Key! If you didn't, if you did not also have the composite natural key, you could quite easily end up with two rows in the table as follows

    id  BankRoutingNumber BankAccountNumber   BankBalance
     1     12345678932154   9876543210123       $123.12
     2     12345678932154   9876543210123    ($3,291.62)
    

    Now, which one is right?

    To marc from comments below, What good does it do you to be able to "identify the row"?? No good at all, it seems to me, because what we need to be able to identify is which bank account the row represents! Identifying the row is only important for internal database technical functions, like joins in queries, or for FK constraint operations, which, if/when they are necessary, should be using a surrogate key anyway, not the natural key.

    You are right in that a poor choice of a natural key, or sometimes even the best available choice of a natural key, may not be truly unique, or guaranteed to prevent duplicates. But any choice is better than no choice, as it will at least prevent duplicate rows for the same values in the attributes chosen as the natural key. These issues can be kept to a minimum by the appropriate choice of key attributes, but sometimees they are unavoidable and must be dealt with. But it is still better to do so than to allow incorrect inaccurate or redundant data into the database.

    As to "ease of use" If all you are using the natural key for is to constrain the insertion of duplicate rows, and you are using another, surrogate, key as the target for FK constraints, I do not see any ease of use issues of concern.

    0 讨论(0)
  • 2021-02-09 16:35

    Users Table

    Using a Guid as a primary key for your Users table is perfect.

    LogEntry table

    Unless you plan to expose your LogEntry data to an external system or merge it with another database, I would simply use an incrementing int rather than a Guid as the primary key. It's easier to work with and will use slightly less space, which could be significant in a huge log stretching several years.

    0 讨论(0)
  • 2021-02-09 16:38
    • The primary key is whatever you make it. Whatever you define as the primary key is the primary key. Usually its an integer ID field.
    • The surrogate key is also this ID field. Its a surrogate for the natural key, which defines uniqueness in terms of your application data.

    The idea behind having an integer ID as the primary key (even it doesnt really mean anything) is for indexing purposes. You would then probably define a natural key as a unique constraint on your table. This way you get the best of both worlds. Fast indexing with your ID field and each row still maintains its natural uniqueness.

    That said, some people swear by just using a natural key.

    0 讨论(0)
  • 2021-02-09 16:39

    There are actually three kinds of keys to talk about. The primary key is what is used to uniquely identify every row in a table. The surrogate key is an artificial key that is created with that property. A natural key is a primary key which is derived from the actual real life data.

    In some cases the natural key may be unwieldy so a surrogate key may be created to be used as a foreign key, etc. For example, in a log or diary the PK might be the date, time, and the full text of the entry (if it is possible to add two entries at the exact same time). Obviously it would be a bad idea to use all of that every time that you wanted to identify a row, so you might make a "log id". It might be a sequential number (the most common) or it might be the date plus a sequential number (like 20091222001) or it might be something else. Some natural keys may work well as a primary key though, such as vehicle VIN numbers, student ID numbers (if they are not reused), or in the case of joining tables the PKs of the two tables being joined.

    This is just an overview of table key selection. There's a lot to consider, although in most shops you'll find that they go with, "add an identity column to every table and that's our primary key". You then get all of the problems that go with that.

    In your case I think that a LogEntryID for your log items seems reasonable. Is the ID an FK to the Users table? If not then I might question having both the ID and the LogEntryID in the same table as they are redundant. If it is, then I would change the name to user_id or something similar.

    0 讨论(0)
  • 2021-02-09 16:41

    No, your ID can be both a surrogate key (which just means it's not "derived from application data", e.g. an artificial key), and it should be your primary key, too.

    The primary key is used to uniquely and safely identify any row in your table. It has to be stable, unique, and NOT NULL - an "artificial" ID usually has those properties.

    I would normally recommend against using "natural" or real data for primary keys - are not REALLY 150% sure it's NEVER going to change?? The Swiss equivalent of the SSN for instance changes each time a woman marries (or gets divorced) - hardly an ideal candidate. And it's not guaranteed to be unique, either......

    To spare yourself all that grief, just use a surrogate (artificial) ID that is system-defined, unique, and never changes and never has any application meaning (other than being your unique ID).

    Scott Ambler has a pretty good article here which has a "glossary" of all the various keys and what they mean - you'll find natural, surrogate, primary key and a few more.

    0 讨论(0)
  • 2021-02-09 16:47

    Wow, you opened a can of worms with this question. Database purists will tell you never to use surrogate keys (like you have above). On the other hand, surrogate keys can have some tremendous benefits. I use them all the time.

    In SQL Server, a surrogate key is typically an auto-increment Identity value that SQL Server generates for you. It has NO relationship to the actual data stored in the table. The opposite of this is a Natural key. An example might be Social Security number. This does have a relationship to the data stored in the table. There are benefits to natural keys, but, IMO, the benefits to using surrogate keys outweigh natural keys.

    I noticed in your example, you have a GUID for a primary key. You generally want to stay away from GUIDS as primary keys. The are big, bulky and can often be inserted into your database in a random way, causing major fragmentation.

    Randy

    0 讨论(0)
提交回复
热议问题