问题
I have heard a few references that pk is not required on fact table. I believe every single table should have a pk.
How could a person understand a row in a fact table if there is no pk and 10+ foreign keys.
回答1:
Primary Key is there
... but Enforcing the primary key
constraint in database level is not required.
If you think about this, technically a unique key or primary key is a key that uniquely defines the characteristics of each row. And it can be composed of more than one attributes of that entity. Now in the case of a Fact table, foreign keys
flowing-in from the other dimension tables together already act as a compounded primary key. And these foreign-key combinations can uniquely identify each record in the fact table. So, this foreign key combination is the primary key for the fact table.
Why not a Surrogate Key then?
Now if you wanted, you could have defined one surrogate key for the fact table. But what purpose would that serve? You are never going to retrieve one record from that fact table referring its surrogate key (use Indexes instead). Neither you are going to use that surrogate key to join the fact with other tables. Such a surrogate key will be completely waste of space in the database.
Enforcing Database Constraints
When you define this conceptual primary key in the database level, database needs to ensure that this constraint is not getting violated in any of the DML operation performed over it. Ensuring this constraint is a overhead for your database. It might be insignificant for an OLTP system, but for a large OLAP system where data are loaded in batch, this may incur significant performance penalties. Beside, why do you want your database to ensure the integrity of the constraints when you can ensure the same during the data loading phase itself (typically through your ETL coding).
回答2:
You are absolutely right that in principle a fact table should have a key. From the point of view of data modelling it is required. In implementation, key constraints in the database usually require an index however. The overhead of creating and maintaining indexes is such that the uniqueness of the "key" attributes is sometimes maintained by controls the integration layer ("ETL process") rather than by a constraint in the database.
Whenever practical it does make sense to create the key constraint within the database. If the key isn't explicitly defined in the database then it ought to be clearly documented for users so that they can understand what the data means.
来源:https://stackoverflow.com/questions/21288549/why-primary-key-is-not-required-on-fact-table-in-dimensional-modelling