In a star schema, are foreign key constraints between facts and dimensions neccessary?

前端 未结 7 1993
清酒与你
清酒与你 2021-01-02 19:33

I\'m getting my first exposure to data warehousing, and I’m wondering is it necessary to have foreign key constraints between facts and dimensions. Are there any major downs

相关标签:
7条回答
  • 2021-01-02 19:36

    As far as I know FKs, speed up queries. Also, many BI solutions exploit them in their integration layer. So for me they are a must in DWs.

    0 讨论(0)
  • 2021-01-02 19:37

    Most data-warehouses (DW) do not have foreign keys implemented as constraints, because:

    • In general, foreign key constraint would trigger on: an insert into a fact table, any key-updates, and a delete from a dimension table.

    • During loading, indexes and constraints are dropped to speed-up the loading process, data integrity is enforced by the ETL application.

    • Once tables are loaded, DW is essentially read-only -- the constraint does not trigger on reads.

    • Any required indexes are re-built after the loading.

    • Deleting in a DW is a controlled process. Before deleting rows from dimensions, fact tables are queried for keys of rows to be deleted -- deleting is allowed only if those keys do not exists in any of fact tables.

    Just in case, it is common to periodically run queries to detect orphan records in fact tables.

    0 讨论(0)
  • 2021-01-02 19:38

    I think in theory, you need that. But it depends on how you separate your data over database. If all of them in the same database, foreign key can help you because setting foreign key will help the database do selecting faster based on the indexing. If you share tables over many database, you need to check it on your application level

    You can have your database check it for you but it can be slow. And generally, in data warehouse, we don't care about redundancy or integrity. We already have a lot of data and a few integrity and redundancy will not affect the general aggregate data

    0 讨论(0)
  • 2021-01-02 19:46

    The reasons for using integrity constraints in a data warehouse are exactly the same as in any other database: to guarantee the integrity of the data. Assuming you and your users care about the data being accurate then you need some way of ensuring that it remains so and that business rules are being correctly applied.

    0 讨论(0)
  • 2021-01-02 19:53

    We use them, and we're happy with it.

    Is it good practice to have foreign keys in a datawarehouse (relationships)?

    There is overhead, but you can always disable the constraint during load and then re-enable it.

    Having the constraint in place can catch ETL bugs and modelling defects.

    0 讨论(0)
  • 2021-01-02 19:57

    Hope this thread is still active. My thinking is: for large fact tables with many dimensions and records, foreign keys will slow inserts and updates so that a fact table becomes too slow to load especially as it increases in size. Indexes are used for querying AFTER the table is loaded, so they can be disabled during inserts/updates and then rebuilt. The foreign key RELATION is important NOT the foreign key itself: this is really implicit in the ETL process. I have found that foreign keys make things TOO slow in the real world Datawarehouse. You need to use a VIRTUAL foreign key: the relation is their but not the constraint. If you damage the foreign key relations in a Datawarehouse you are doing something wrong. If you disable them during inserts and there is an mismatch or orphan, you won't be able to reenable them, so what's the point. The whole point of the DW is fast access and querying. Foreign keys make that impossible. Interesting debate: not easy to find this question on the Net Kev

    0 讨论(0)
提交回复
热议问题