SQL: Normalization of database while retaining constraints

江枫思渺然 提交于 2019-12-10 14:54:41

问题


Suppose I have the following tables:

     ____________________             ____________________
    |     Organisms      |           |       Species      |
    |--------------------|           |--------------------|
    |OrganismId (int, PK)|           |SpeciesId (int, PK) |
    |SpeciesId (int, FK) |∞---------1|Name (varchar)      |
    |Name (varchar)      |           |____________________|
    |____________________|                      1
              1                                 |
              |                                 |
              |                                 |
              ∞                                 ∞
    ______________________        ____________________          _______________
   | OrganismPropsValues  |      |   SpeciesProps     |        |     Props     |
   |----------------------|      |--------------------|        |---------------|
   |OrganismId (int, FK)  |      |PropId (int,PK,FK)  | ∞-----1|PropId (int,PK)|
   |PropId (int, FK)      |      |SpeciesId(int,PK,FK)|        |Name (varchar) |
   |Value (varchar)       |      |____________________|        |_______________|
   |______________________|                                             1
              ∞                                                         |
              |                                                         |
              -----------------------------------------------------------

A quick explanation of what I am trying to represent here: suppose we have a list of species, such as cat, dog, human, etc. We also have a set of properties (abbreviated Props so I could fit it more easily in the diagram) which apply to some but not necessarily all species--for example, this may be tail length (for species with tails), eye color (for those with eyes), etc.

SpeciesProps is a linker table that defines which properties apply to which species-- so here we would have {Human, Eye Color}, {Dog, Eye Color}, {Cat, Eye Color}, {Dog, Tail Length}, {Cat, Tail Length}. We do not have {Human, Tail Length} because Tail Length is obviously not a valid property to apply to a human.

The Organisms table holds actual "implementations" of the species-- So here we might have {Human, Bob}, {Dog, Rufus}, and {Cat, Felix}.

Here is now my issue: in the OrganismPropsValues table, I want to store the 'values' of the properties for each organism--so for example, for Bob I want to store {Bob, Eye Color, Blue}. For Rufus, I would want to store {Rufus, Eye Color, Brown} and {Rufus, Tail Length, 20} (similar for Felix). My problem however, is that in the schema that I have detailed, it is perfectly possible to store {Bob, Tail Length, 10}, even though the {Human, Tail Length} tuple does not exist in SpeciesProps. How can I modify this schema so I can enforce the constraints defined in SpeciesProps in OrganismPropsValues, while maintaining adequate normalization?


回答1:


You're implementing the Entity-Attribute-Value antipattern. This can't be a normalized database design, because it's not relational.

What I would suggest instead is the Class Table Inheritance design pattern:

  • Create one table for Organisms, containing properties common to all species.
  • Create one table per species, containing properties specific to that species. Each of these tables has a 1-to-1 relationship with Organisms, but each property belongs in its own column.

     ____________________             ____________________
    |     Organisms      |           |       Species      |
    |--------------------|           |--------------------|
    |OrganismId (int, PK)|           |SpeciesId (int, PK) |
    |SpeciesId (int, FK) |∞---------1|Name (varchar)      |
    |Name (varchar)      |           |____________________|
    |____________________|
              1
              |
              |
              1
     ______________________ 
    |    HumanOrganism     |
    |----------------------|
    |OrganismId (int, FK)  |
    |Sex      (enum)       |
    |Race     (int, FK)    |
    |EyeColor (int, FK)    |
    |....                  |
    |______________________|
    

This does mean you will create many tables, but consider this as a tradeoff with the many practical benefits to storing properties in a relationally correct way:

  • You can use SQL data types appropriately, instead of treating everything a free-form varchar.
  • You can use constraints or lookup tables to restrict certain properties by a predefined set of values.
  • You can make properties mandatory (i.e. NOT NULL) or use other constraints.
  • Data and indexes are stored more efficiently.
  • Queries are easier for you to write and easier for the RDBMS to execute.

For more on this design, see Martin Fowler's book Patterns of Enterprise Application Architecture, or my presentation Practical Object-Oriented Models in SQL, or my book, SQL Antipatterns: Avoiding the Pitfalls of Database Programming.




回答2:


Hmm...
Here is one way to do it:
Add SpeciesPropsId into SpeciesProps table.
Replace PropId with SpeciesPropsId in the OrganismPropsValues table.
You will need to change constrains a bit.
Need to add SpeciesProps to OrganismPropsValues constrain.
Need to remove OrganismPropsValues to Props constrain.

Technically you do not have to remove PropId from OrganismPropsValues, but if you keep it it will make data redundat.




回答3:


Whenever you have a diamond-shaped dependency like this, consider putting more emphasis on composite PRIMARY KEYS.

Specifically, identify the Organism not just by OrganismId, but by the combination of SpeciesId and OrganismSubId (you can still have OrganismId, but keep it as an alternate key - not show here for brevity).

Once you do that, your model can be made to look like this:

The key thing to note here is that SpeciesId is "propagated" down both edges of this diamond-shaped graph. This is what gives you the desired restriction of not being able "assign a value" to a property that was not "declared" for the given species.

BTW, use singular when naming your tables. Also, consider using natural primary keys (e.g. SpeciesName instead of SpeciesId as PK) - if done right it can significantly increase the speed of your JOINs (especially in conjunction with clustering).




回答4:


Another way to achieve these constraints would be to change the PK of Organism table by dropping OrganismId and adding a No. Then make PK the compound (SpeciesId, No). So, "Bob" would be (Human, 1), "Rufus" would be (Dog, 1), etc.

Then, add in the OrganismPropsValues table, the SpeciesId and the No (removing the OrganismId.)

This will allow to change the FK from OrganismPropsValues to Props to reference SpeciesProps instead:

     ____________________             ____________________
    |     Organisms      |           |       Species      |
    |--------------------|           |--------------------|
    |SpeciesId (int, FK) |           |SpeciesId (int, PK) |
    |No (int)            |∞---------1|Name (varchar)      |
    |Name (varchar)      |           |____________________|
    |PK (SpeciedId,No)   |                      1
    |____________________|                      |
              1                                 |
              |                                 |
              |                                 |
              ∞                                 ∞
    ______________________        ____________________          _______________
   | OrganismPropsValues  |      |   SpeciesProps     |        |     Props     |
   |----------------------|      |--------------------|        |---------------|
   |SpeciesId (int, PK)   |      |PropId (int,PK,FK)  | ∞-----1|PropId (int,PK)|
   |No (int, PK)          |      |SpeciesId(int,PK,FK)|        |Name (varchar) |
   |PropId (int, PK)      |      |____________________|        |_______________|
   |Value (varchar)       |                 1
   |FK (SpeciesId,No)     |                 |
   |FK (SpeciesId,PropId) |                 |
   |______________________|                 |
              ∞                             |
              |                             |
              -------------------------------


来源:https://stackoverflow.com/questions/7183039/sql-normalization-of-database-while-retaining-constraints

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!