What's the best practice for primary keys in tables?

前端 未结 21 2173
别那么骄傲
别那么骄傲 2020-11-22 14:02

When designing tables, I\'ve developed a habit of having one column that is unique and that I make the primary key. This is achieved in three ways depending on requirements

相关标签:
21条回答
  • 2020-11-22 14:13

    Here are my own rule of thumbs I have settled on after 25+ years of development experience.

    • All tables should have a single column primary key that auto increments.
    • Include it in any view that is meant to be updateable
    • The primary key should not have any meaning in the context of your application. This means that it should not be a SKU, or an account number or an employee id or any other information that is meaningful to your application. It is merely a unique key associated with an entity.

    The primary key is used by the database for optimization purposes and should not be used by your application for anything more than identifying a particular entity or relating to a particular entity.

    Always having a single value primary key makes performing UPSERTs very straightforward.

    Use additional indices to support multi-column keys which have meaning in your application.

    0 讨论(0)
  • 2020-11-22 14:15

    I look for natural primary keys and use them where I can.

    If no natural keys can be found, I prefer a GUID to a INT++ because SQL Server use trees, and it is bad to always add keys to the end in trees.

    On tables that are many-to-many couplings I use a compound primary key of the foreign keys.

    Because I'm lucky enough to use SQL Server I can study execution plans and statistics with the profiler and the query analyzer and find out how my keys are performing very easily.

    0 讨论(0)
  • 2020-11-22 14:15

    I'll be up-front about my preference for natural keys - use them where possible, as they'll make your life of database administration a lot easier. I established a standard in our company that all tables have the following columns:

    • Row ID (GUID)
    • Creator (string; has a default of the current user's name (SUSER_SNAME() in T-SQL))
    • Created (DateTime)
    • Timestamp

    Row ID has a unique key on it per table, and in any case is auto-generated per row (and permissions prevent anyone editing it), and is reasonably guaranteed to be unique across all tables and databases. If any ORM systems need a single ID key, this is the one to use.

    Meanwhile, the actual PK is, if possible, a natural key. My internal rules are something like:

    • People - use surrogate key, e.g. INT. If it's internal, the Active Directory user GUID is an acceptable choice
    • Lookup tables (e.g. StatusCodes) - use a short CHAR code; it's easier to remember than INTs, and in many cases the paper forms and users will also use it for brevity (e.g. Status = "E" for "Expired", "A" for "Approved", "NADIS" for "No Asbestos Detected In Sample")
    • Linking tables - combination of FKs (e.g. EventId, AttendeeId)

    So ideally you end up with a natural, human-readable and memorable PK, and an ORM-friendly one-ID-per-table GUID.

    Caveat: the databases I maintain tend to the 100,000s of records rather than millions or billions, so if you have experience of larger systems which contraindicates my advice, feel free to ignore me!

    0 讨论(0)
  • 2020-11-22 14:16

    What is special about the primary key?

    What is the purpose of a table in a schema? What is the purpose of a key of a table? What is special about the primary key? The discussions around primary keys seem to miss the point that the primary key is part of a table, and that table is part of a schema. What is best for the table and table relationships should drive the key that is used.

    Tables (and table relationships) contain facts about information you wish to record. These facts should be self-contained, meaningful, easily understood, and non-contradictory. From a design perspective, other tables added or removed from a schema should not impact on the table in question. There must be a purpose for storing the data related only to the information itself. Understanding what is stored in a table should not require undergoing a scientific research project. No fact stored for the same purpose should be stored more than once. Keys are a whole or part of the information being recorded which is unique, and the primary key is the specially designated key that is to be the primary access point to the table (i.e. it should be chosen for data consistency and usage, not just insert performance).

    • ASIDE: The unfortunately side effect of most databases being designed and developed by application programmers (which I am sometimes) is that what is best for the application or application framework often drives the primary key choice for tables. This leads to integer and GUID keys (as these are simple to use for application frameworks) and monolithic table designs (as these reduce the number of application framework objects needed to represent the data in memory). These application driven database design decisions lead to significant data consistency problems when used at scale. Application frameworks designed in this manner naturally lead to table at a time designs. “Partial records” are created in tables and data filled in over time. Multi-table interaction is avoided or when used causes inconsistent data when the application functions improperly. These designs lead to data that is meaningless (or difficult to understand), data spread over tables (you have to look at other tables to make sense of the current table), and duplicated data.

    It was said that primary keys should be as small as necessary. I would says that keys should be only as large as necessary. Randomly adding meaningless fields to a table should be avoided. It is even worse to make a key out of a randomly added meaningless field, especially when it destroys the join dependency from another table to the non-primary key. This is only reasonable if there are no good candidate keys in the table, but this occurrence is surely a sign of a poor schema design if used for all tables.

    It was also said that primary keys should never change as updating a primary key should always be out of the question. But update is the same as delete followed by insert. By this logic, you should never delete a record from a table with one key and then add another record with a second key. Adding the surrogate primary key does not remove the fact that the other key in the table exists. Updating a non-primary key of a table can destroy the meaning of the data if other tables have a dependency on that meaning through a surrogate key (e.g. a status table with a surrogate key having the status description changed from ‘Processed’ to ‘Cancelled’ would definitely corrupt the data). What should always be out of the question is destroying data meaning.

    Having said this, I am grateful for the many poorly designed databases that exist in businesses today (meaningless-surrogate-keyed-data-corrupted-1NF behemoths), because that means there is an endless amount of work for people that understand proper database design. But on the sad side, it does sometimes make me feel like Sisyphus, but I bet he had one heck of a 401k (before the crash). Stay away from blogs and websites for important database design questions. If you are designing databases, look up CJ Date. You can also reference Celko for SQL Server, but only if you hold your nose first. On the Oracle side, reference Tom Kyte.

    0 讨论(0)
  • 2020-11-22 14:17

    I follow a few rules:

    1. Primary keys should be as small as necessary. Prefer a numeric type because numeric types are stored in a much more compact format than character formats. This is because most primary keys will be foreign keys in another table as well as used in multiple indexes. The smaller your key, the smaller the index, the less pages in the cache you will use.
    2. Primary keys should never change. Updating a primary key should always be out of the question. This is because it is most likely to be used in multiple indexes and used as a foreign key. Updating a single primary key could cause of ripple effect of changes.
    3. Do NOT use "your problem primary key" as your logic model primary key. For example passport number, social security number, or employee contract number as these "primary key" can change for real world situations.

    On surrogate vs natural key, I refer to the rules above. If the natural key is small and will never change it can be used as a primary key. If the natural key is large or likely to change I use surrogate keys. If there is no primary key I still make a surrogate key because experience shows you will always add tables to your schema and wish you'd put a primary key in place.

    0 讨论(0)
  • 2020-11-22 14:17

    All tables should have a primary key. Otherwise, what you have is a HEAP - this, in some situations, might be what you want (heavy insert load when the data is then replicated via a service broker to another database or table for instance).

    For lookup tables with a low volume of rows, you can use a 3 CHAR code as the primary key as this takes less room than an INT, but the performance difference is negligible. Other than that, I would always use an INT unless you have a reference table that perhaps has a composite primary key made up from foreign keys from associated tables.

    0 讨论(0)
提交回复
热议问题