How do I get around this relational database design smell?

自古美人都是妖i 提交于 2020-01-30 09:20:09

问题


I have a really simple mediaTypes table which contains the following columns:

id string
name string

Each mediaType record can have many "placements", which could easily be designed as follows:

Placements

id string
mediaTypeId string (links to mediaTypes.id)
name string
detail_col_1
detail_col_2
...etc

However depending on the media type, a placement can contain different details, so if I designed the schema this way I may end up with a lot of nullable columns.

To get around this, I could have an aPlacements table and a bPlacements table to match each different media type.

aPlacements

  id string
  mediaTypeId string (links to mediaTypes.id)
  name string
  placement_details_relevant_to_media_type_col_1
  placement_details_relevant_to_media_type_col_2

bPlacements

  id string
  mediaTypeId string (links to mediaTypes.id)
  name string
  placement_details_relevant_to_media_type_col_1
  placement_details_relevant_to_media_type_col_2

The drawback of this is how would I then find a placement by id as I'd have to query across all tables:

SELECT * FROM aPlacements WHERE id = '1234'
UNION ALL
SELECT * FROM bPlacements WHERE id = '1234'
etc

The whole design feels like a bit of a design smell. Any suggestions on how I could clean up this schema?


回答1:


Noting the Relational database tag.

The whole design feels like a bit of a design smell

Yes. It smells for two reasons.

  1. You have ids as Identifiers in each table. That will confuse you, and make for code that is easy to screw up. For an Identifier:
    • name it for the thing that it Identifies
      eg. mediaType, placementCode (they are strings, which is correct)
    • where it is located as a Foreign Key, name it exactly the same, so that there is no confusion about what the column is, and what PK it references

However depending on the mediaType, a placement can contain different details

  1. What you are seeking in logical terms, is an OR Gate.
    In Relational terms, it is a Subtype, here an Exclusive Subtype.
    That is, with complete integrity and constraints.
    mediaType is the Discriminator.

if I designed the schema this way I may end up with a lot of nullable columns.

Yes, you are correct. Nullable columns indicates that the modelling exercise, Normalisation, is incomplete. Two Subtype tables is correct.

Relational Data Model

Note • Notation

  • All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993

  • My IDEF1X Introduction is essential reading for beginners

Note • Content

  • Exclusive Subtype

    • Each Placement is either a PlacementA xor a PlacementB
    • Refer to Subtype for full details on Subtype implementation.
  • Relational Key

    • They are strings, as you have given.
    • They are "made up from the data", as required by the Relational Model.
    • Such Keys are Logical, they ensure the rows are unique.
    • Further they provide Relational Integrity (as distinct from Referential Integrity), which cannot be shown here, in this small data model.
    • Note that IDs that are manufactured by the system, which is NOT data, and NOT seen by the user, are physical, pointing to Records (not logical rows). They provide record uniqueness but not row uniqueness. They cannot provide Relational integrity.
    • The RM requires that rows (not records) are unique.

SQL

The drawback of this is how would I then find a placement by id as I'd have to query across all tables:

Upgraded as per above, that would be:

The drawback of this is how would I then find the relevant Placement columns by the PK Placement, as I'd have to query across all tables:

First, understand that SQL works perfectly for Relational databases, but it is, by its nature, a low-level language. Most of us in the real world use an IDE (I don't know anyone who does not), thus much of its cumbersomeness is eased, and many coding errors are eliminated.

Where we have to code SQL directly, yes, that is what you have to do. Get used to it. There are just two tables here.

Your code will not work, it assumes the columns are identical datatypes and in the same order (which is required for the UNION). There are not.

  • Do not force them to be, just to make your UNION succeed. There may well be additional columns in one or the other Subtype, later on, and then your code will break, badly, everywhere that it is deployed.

  • For code that is implemented, never use asterisk in a SELECT (it is fine for development only). That guarantees failure when the database changes. Always use a column list, and request only the columns you need.

SELECT Placement,
       ColumnA1,
       ColumnA2,
       ColumnB1 = "",
       ColumnB2 = "",
       ColumnB3 = ""  
    FROM  PlacementA  
    WHERE Placement = 'ABCD'  
--
UNION
--
SELECT Placement,
       "",
       "",
       ColumnB1,
       ColumnB2,
       ColumnB3  
    FROM  PlacementB  
    WHERE Placement = 'ABCD'

View

The Relational Model, and SQL its data sublanguage, has the concept of a View. This is how one would use it. Each Basetype and Subtype combination is considered a single unit, a single row.

CREATE VIEW PlacementA_V 
AS
    SELECT  Placement,
            MediaType,
            ColumnCommon,
            ColumnA1,
            ColumnA2
        FROM Placement  BASE
        JOIN PlacementA SUBA
            ON BASE.Placement = SUBA.Placement

Enjoy.


Comments

In postgres, is there a way I could setup a constraint where the placement can ONLY exist in either PlacementA OR PlacementB and not both?

  1. That is Exclusivity.

    • If you read the linked Subtype doc, I have given a full explanation and technical details for implementation in SQL, including all code (follow the links in each document). It consists of:
      .
      a CONSTRAINT that calls a FUNCTION
      .

      ALTER TABLE ProductBook  -- subtype
      ADD CONSTRAINT ProductBook_Excl_ck
       -- check an existential  condition, which calls
          -- function using PK & discriminator
       CHECK ( dbo.ValidateExclusive_fn ( ProductId, "B" ) = 1 )
      
    • We have had that capability in SQL for over 15 years in my experience.

  2. Pusgre**NON*sql is not SQL compliant in many areas.
    None of the freeware/shareware/vapourware/noware is SQL compliant (their use of the term SQL is fraudulent). They do not have a Server Architecture, most do not have ACID Transactions, etc.
    Therefore, no. It cannot call a Function from DDL.

  3. As long as you understand and implement Standards, such as Open Architecture, to the degree possible in your particular NONsql suite (it cannot be labelled a platform because it has no Server Architecture), that is the best you can do.

  4. The Open Architecture Standard demands:

    • no direct INSERT/UPDATE/DELETE to the tables
    • all your writes to the db are done via OLTP Transactions

      • which in SQL means:
        Stored Procedures with BEGIN TRAN ... COMMIT/ROLLBACK TRAN
      • but in PusgreNONsql means:
        Functions which are supposed to be "atomic"
        (quotes because it is nowhere near the Atomic that is implemented in SQL ACID Transactions [the A in ACID stands for Atomic] )
  5. Therefore, take the Exclusivity code in the Function I have given in SQL, and:

    • deploy it in every "atomic" Function that INSERT/DELETEs to the Basetype or Subtype tables in your pretend sql suite.
      (I do not allow UPDATE to a Key, refer CASCADE above.)

    • while we are here, it must be mentioned, such "atomic" Functions need to likewise have code to ensure that the Basetype-Subtype pair is INSERT/DELETEd as pair or not at all.




回答2:


maybe this is a subjective solution. If the Placements table have no much columns, ej: (detail_col_1, detail_col_2, detail_col_3.. detail_col_6) the table design is not that bad, I mean, it doesn`t depend of how many null columns you got, maybe it looks ugly but it should work. Now, if you want a complex method I'd suggest some of these:

  1. Simple Placements table with json in it:
MediaTypes
+ id
+ name

Placements
+ id
+ mediaTypeId
+ name
+ detail

In detail I can define my attributes as json, and set the correct values for each type:

row 1: {'attr1': valx, 'attr2': valy} row 2: {'attr4': valz, 'attr1': valw}

Now, the problem here is the query filter (you cannot). This should work if you want to save extra info.

  1. An elegant way:
MediaTypes
+ id
+ name

Placements
+ id
+ mediaTypeId
+ name

DetailAttributes //table of attributes for any type 
+ id
+ name
+ mediaTypeId

PlacementDetailAttributes //many to many rel between DetailAttributes&Placements
+ placementId
+ detailAttributeId
+ value

With this approach you can add many attributes as you want. Query filter by attributes should work too!!



来源:https://stackoverflow.com/questions/58862651/how-do-i-get-around-this-relational-database-design-smell

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!