Pros & Cons of Date Column as Part of Primary Key

一笑奈何 提交于 2020-03-03 11:44:45

问题


I am currently working on a database, where a log is required to track a bunch of different changes of data. Stuff like price changes, project status changes, etc. To accomplish this I've made different 'log' tables that will be storing the data needing to be kept.

To give a solid example, in order to track the changing prices for parts which need to be ordered, I've created a Table called Part_Price_Log. The primary key is composite made up of the date in which the part price is being modified, and a foreign key to the Part's unique ID on the Parts Table.

My logic here, is that if you need to look up the current price for a part, you just need to find the most recent entry for that Part ID. However, I am being told not to implement it this way because using Date as part of a primary key is an easy way to get errors in your data.

So my question is thus.

What are the pros/cons of using a Date column as part of a composite primary key? What are some better alternatives?


回答1:


In general, I think the best primary keys are synthetic auto-incremented keys. These have certain advantages:

  • The key value records the insertion order.
  • The keys are fixed length (typically 4 bytes).
  • Single keys are much simpler for foreign key references.
  • In databases (such as SQL Server by default) that cluster the data based on the primary key, inserts go "at the end".
  • They are relatively easy to type and compare (my eyes just don't work well for comparing UUIDs).

The fourth of these is a really big concern in a database that has lots of inserts, as suggested by your data.

There is nothing a priori wrong with composite primary keys. They are sometimes useful. But that is not a direction I would go in.




回答2:


Pros and cons will vary depending on the performance requirements and how often you will query this table.

As a first example think about the following:

CREATE TABLE Part_Price_Log (
    ModifiedDate DATE,
    PartID INT,
    PRIMARY KEY (ModifiedDate, PartID))

If the ModifiedDate is first and this is an logging table with insert-only rows, then every new row will be placed at the end, which is good (reduces fragmentation). This approach is also good when you want to filter directly by ModifiedDate, or by ModifiedDate + PartID, as ModifiedDate is the first column in the primary key. A con here would be searching by PartID, as the clustered index of the primary key won't be able to seek directly the PartID.

A second example would be the same but inverted primary key ordering:

CREATE TABLE Part_Price_Log (
    ModifiedDate DATE,
    PartID INT,
    PRIMARY KEY (PartID, ModifiedDate))

This is good for queries by PartID, but not much for queries directly by ModifiedDate. Also having PartID first would make inserts displace data pages as inserted PartIDis lower than the max PartID (which increases fragmentation).

The last example would be using a surrogate primary key like an IDENTITY.

CREATE TABLE Part_Price_Log (
    LogID BIGINT IDENTITY PRIMARY KEY,
    ModifiedDate DATE,
    PartID INT)

This will make all inserts go last and reduce fragmentation but you will need an additional index to query your data, such as:

CREATE NONCLUSTERED INDEX NCI_Part_Price_Log_Date_PartID ON Part_Price_Log (ModifiedDate, PartID)
CREATE NONCLUSTERED INDEX NCI_Part_Price_Log_PartID_Date ON Part_Price_Log (PartID, ModifiedDate)

The con about this last one is that insert operations will take longer (as the index also has to be updated) and the size of the table will increase due to indexes.

Also keep in mind that if your data allows for multiple updates of the same part for the same day, then using compound PRIMARY KEY would make the 2nd update fail. Your choices here are to use a surrogate key, use a DATETIME instead of DATE (will give you more margin for updates), or use a CLUSTERED INDEX with no PRIMARY KEY or UNIQUE constraint.


I would suggest doing the following. You only keep one index (the actual table, as it is clustered), the order is always insert, you don't need to worry about repeated ModifiedDate with same PartID and your queries by date will be fast.

CREATE TABLE Part_Price_Log (
    LogID INT IDENTITY PRIMARY KEY NONCLUSTERED,
    ModifiedDate DATE,
    PartID INT)

CREATE CLUSTERED INDEX NCI_Part_Price_Log_Date_PartID ON Part_Price_Log (ModifiedDate, PartID)



回答3:


Without knowing your domain, it's really hard to advise. How do your identify a part in the real world? Let's assume you use EAN. This is your 'natural key'. Now, does a part get a new EAN each time the price changes? Probably not, in which case the real world identifier for a part price is a composite of its EAN and the period of time during which that price was effective.

I think the comment about "an easy way to get errors in your data" is referring to the fact the tempoal databases are not only more complex by nature (they have a additional dimension - time), the support for temporal functionality is lacking in most SQL DBMSs.

For example, does your SQL product of choice have an interval data type, or do you need to roll your own using a pair of start_date and end_date columns? Does your SQL product of choice have the capability to intra-table constraints e.g. to prevent overlapping or non-concurrent intervals for the same part? Does your SQL product have temporal functions to query temporal data easily?




回答4:


I agree that it is better to keep the identity column/uniqueidentifier as primary key in this scenario, Also if you make partid and date as composite primary key, it is going to fail in a case when two concurrent users try to update the part price at same time.The primary key is going to fail in that case.So the better approach will be to have an identity column as primary key and keep on dumping the changes in log table.In case you hit some performance barriers later on you can partition your table year wise and can overcome that performance challenge.



来源:https://stackoverflow.com/questions/50043071/pros-cons-of-date-column-as-part-of-primary-key

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!