Star schema [fact 1:n dimension]…how?

后端 未结 3 1967
借酒劲吻你
借酒劲吻你 2021-01-13 21:13

I am a newcomer to data warehouses and have what I hope is an easy question about building a star schema:

If I have a fact table where a fact record naturally has a

相关标签:
3条回答
  • 2021-01-13 21:46

    You should load a fact record for each promotion, even if the dollar amount is the same. If in fact, each type of promotion in your example is truly represented by this specific dollar amount, then a fact record should be loaded with the key of the promotion type, also containing keys back to other related dimensions (including Date).

    The main point here is don't worry about data duplication. Think about a sales-oriented Data Warehouse, for say, a fast food company. One can assume there won't be just one fact record for $4.13, which is used to represent a million distinct sales of "value meal #3". Instead, each record in the "Transaction" dimension would have a relationship with at least one specific fact record in this hypothetical Sales fact table.

    0 讨论(0)
  • 2021-01-13 21:54

    Time is almost always a dimension in a star schema.

    "In effect" suggests that there is a start and end date for a Promotion.

    So a Promotion might itself be a fact that has a start and end date reference to the Time dimension.

    Maybe with a model like this you could have a JOIN table to relate Sale to Promotion in a many-to-many fashion between facts.

    "Many, many" Promotions - yes, but how large is that? One per day means 365 records per year. I'll assume that Promotions are associated somehow with Products or Categories. A Sale would have a timestamp and multiple Products.

    You have to store them somewhere, sometime or your model falls apart. Why the reluctance to model Promotion that way?

    My advice would be to not worry about the size of the data and concentrate on modeling the problem as best you can. Get the logical model right first, then worry about the physical model and the data sizes.

    0 讨论(0)
  • 2021-01-13 21:59

    For cases when you truly have a "multi-valued" dimension, a Bridge Table is usually the solution that Kimball recommends.

    Your "Promotion" dimension simply is a record of each promotion, with its attributes (start date, end date, coupon code, POS promo code, Ad Name, etc). The relationship from promo to product isn't modeled here, since it will be reflected in the fact table.

    Promotion/Discount Dimension would look like (1 row per unique planned promotion)

    Promotion Dim ID
    Promo Code
    Coupon Code
    Promo Start DTTM
    Promo End DTTM
    ... etc ...
    

    Your Sales Fact would look like:

    Tran Date
    Tran Line #
    Customer Dim ID
    Product Dim ID
    Promotion Group Dim ID
    Net Sale Price
    Average Cost
    Discount Amount
    

    Your "Promotion Group" bridge table would then be the set of combinations:

    Promotion Group Dim ID
    Promotion Dim ID
    

    If a sale occurs that has 3 promotions on it, you simply create group ID that relates to each promo, then put the group ID on the fact table. It's very similar to the way that medical reporting systems deal with multiple diagnoses.

    Note that by using a Bridge table, you can easily double count sales, so I advise that reports using this method be developed by folks that understand the model.

    0 讨论(0)
提交回复
热议问题