When to use one field as primary key instead of 2?

后端 未结 4 1055
借酒劲吻你
借酒劲吻你 2020-12-12 03:11

I often see some database design like this:

Case 1:

UserTable

--id[auto increase]

--UserName

--Password

--Em

4条回答
  •  醉梦人生
    2020-12-12 03:53

    In Case 1: Why not use UserName field as primary key (PK)? why use another filed likes id [which is auto increased] as PK?

    The UserTable.UserName has intrinsic meaning in this data model and is called "natural key". The UserTable.id, on the other hand, is "surrogate key".

    If there is a natural key in your model, you cannot eliminate it with the surrogate key, you can just supplant it. So the question is: do you just use the natural key, or the natural and surrogate key? Both strategies are actually valid and have their pros and cons.

    Typical reasons for surrogate key:

    • To keep FKs in child tables slimmer (integer vs. string in this case), for smaller storage and better caching.
    • Avoid the need for ON UPDATE CASCADE.
    • Friendliness toward ORM tools.

    On the other hand:

    • You now have two keys instead of one, requiring an extra index, making the parent table larger and less cache-friendly, and slowing down INSERT/UPDATE//DELETE due to index maintenance.1
    • May require more JOIN-ing2.
    • And may not play well with clustering.3

    In case of just UserName and Email, why not use Email as PK?

    The designer probably wanted to avoid ON CASCADE UPDATE that would be necessary if user changed the e-mail.

    In Case 2: In the UserRoleTable, why not use both UserName and RoleID as PK?

    If there cannot be multiple connections for the same user/role pair, you have to have a key on that in any case.

    Unless there are child tables with FKs referencing UserTableRole or an unfriendly ORM is used, there is no reason for an additional surrogate PK.


    1 And if clustering is used, the secondary index under the natural key may be extra "fat" (since it contains a copy of the clustering key, which is typically PK) and require a double-lookup when querying (since rows in clustered table don't have stable physical locations, so must be located through a clustering key, barring some DBMS-specific optimizations such as Oracle's "rowid guesses").

    2 E.g. you wouldn't be able to find UserName just by reading the junction table - you'd have to JOIN it with the UserTable.

    3 Surrogates are typically ordered in a way that is not meaningful to the client applications. The auto-increment surrogate key's order depends on the order of INSERTs, and querying is not typically done on a "range of users by their order of insertion". Some surrogates such as GUIDs may be more-less randomly ordered.

提交回复
热议问题