How to model a database with many m:n relations on a table

后端 未结 5 1262
孤城傲影
孤城傲影 2021-01-02 06:25

I am currently setting up a database which has a large number of many-to-many relations. Every relationship was modeled via a link table. Example:

A person has a num

相关标签:
5条回答
  • 2021-01-02 07:05

    In my humble opinion I would go for the first model. It's probably a more complex model but in the end it will make things easier when you're extracting info from tables and the application code could get dirtier or more unreadable for other programmers. Beside, there are some authors that wouldn't reccommend to use multipurpose tables like that.

    In the end you must go with whatever suits you better. We don't know the whole context so can't help you too much to decide. But, for what you're saying and I'd definitely go for option number one.

    0 讨论(0)
  • 2021-01-02 07:07

    The second model is a problem from several perspectives. First it is likely to create blocking issues as everything goes to the one meta table. Second it is far more likely to have data integrity issues as you can't enforce the foreign key constraints. It is a SQL antipattern to model that way. The first model was correct.

    0 讨论(0)
  • 2021-01-02 07:10

    Your simplified version does not represent a proper relational model. It's more of a metadata model.

    The number of tables in your database should represent the number of logical entities in your domain. That should not change based on some arbitrary idea of how many entities is too many.

    0 讨论(0)
  • 2021-01-02 07:11

    Your design violates Fourth Normal Form. You're trying to store multiple "facts" in one table, and it leads to anomalies.

    The Person_Attributes table should look something like this: personId jobId houseId restaurantId

    So if I associate with one job, one house, but two restaurants, do I store the following?

    personId jobId houseId restaurantId
        1234    42      87         5678
        1234    42      87         9876
    

    And if I add a third restaurant, I copy the other columns?

    personId jobId houseId restaurantId
        1234   123      87         5678
        1234   123      87         9876
        1234    42      87        13579 
    

    Done! Oh, wait, what happened there? I changed jobs at the same time as adding the new restaurant. Now I'm incorrectly associated with two jobs, but there's no way to distinguish between that and correctly being associated with two jobs.

    Also, even if it is correct to be associated with two jobs, shouldn't the data look like this?

    personId jobId houseId restaurantId
        1234   123      87         5678
        1234   123      87         9876
        1234   123      87        13579 
        1234    42      87         5678
        1234    42      87         9876
        1234    42      87        13579 
    

    It starts looking like a Cartesian product of all distinct values of jobId, houseId, and restaurantId. In fact, it is -- because this table is trying to store multiple independent facts.

    Correct relational design requires a separate intersection table for each many-to-many relationship. Sorry, you have not found a shortcut.

    (Many articles about normalization say the higher normal forms past 3NF are esoteric, and one never has to worry about 4NF or 5NF. Let this example disprove that claim.)


    Re your comment about using NULL: Then you have a problem enforcing uniqueness, because a PRIMARY KEY constraint requires that all columns be NOT NULL.

    personId jobId houseId restaurantId
        1234   123      87         5678
        1234  NULL    NULL         9876
        1234  NULL    NULL        13579 
    

    Also, if I add a second house or a second jobId to the above table, which row do I put it in? You could end up with this:

    personId jobId houseId restaurantId
        1234   123      87         5678
        1234  NULL    NULL         9876
        1234    42    NULL        13579 
    

    Now if I disassociate restaurantId 9876, I could update it to NULL. But that leaves a row of all NULLs, which I really should just delete.

    personId jobId houseId restaurantId
        1234   123      87         5678
        1234  NULL    NULL         NULL
        1234    42    NULL        13579 
    

    Whereas if I had disassociated restaurant 13579, I could update it to NULL and leave the row in place.

    personId jobId houseId restaurantId
        1234   123      87         5678
        1234  NULL    NULL         9876
        1234    42    NULL         NULL 
    

    But shouldn't I consolidate rows, moving the jobId to another row, provided there's a vacancy in that column?

    personId jobId houseId restaurantId
        1234   123      87         5678
        1234    42    NULL         9876
    

    The trouble is, now it's getting more and more complex to add or remove associations, requiring multiple SQL statements for changes. You're going to have to write a lot of tedious application code to handle this complexity.

    However, all the various changes are easy if you define one table per many-to-many relationship. You do need the complexity of having that many more tables, but by doing that you will simplify your application code.

    Adding an association to a restaurant is simply an INSERT to the Person_Restaurant table. Removing that association is simply a DELETE. It doesn't matter how many associations there are to jobs or houses. And you can define a primary key constraint in each of these intersection tables to enforce uniqueness.

    0 讨论(0)
  • 2021-01-02 07:29

    I do not think the second method is correct because your Person_Attributes table would contain redundant data. For example: say a person likes 10 restaurants and works on 2 jobs, has 3 houses you would have as many as 10*2*3 entries where it should be 10 + 2 + 3(in 3 link tables...as per approach#1). Think of drawbacks having million users and if you had more than 3 attributes in Person_Attributes table to handle... so I would go with approach 1 in your question.

    Say for example your Person_Attributes table has following entry:

    personId | houseId | jobId | restaurantId
    ------------------------------------------
    P1      H1  J1  R1
    

    now if the person likes restaurants R2 and R3...table looks like

    P1      H1      J1      R1
    P2      H1      J1      R2
    P2      H1      J1      R3
    

    table already has redundant data he adds Job J2 at a later point.. your table will look like

    P1      H1      J1      R1
    P2      H1      J1      R2
    P2      H1      J1      R3
    P1      H1      J2      R1
    P2      H1      J2      R2
    P2      H1      J2      R3
    

    Now consider he adds another home H2.. so on and so forth...Do you see my point?

    0 讨论(0)
提交回复
热议问题