What is the best way to represent a many-to-many relationship between records in a single SQL table?

怎甘沉沦 提交于 2019-12-03 14:58:49

Define a constraint: entity_id_a < entity_id_b.

Create indexes:

CREATE UNIQUE INDEX ix_a_b ON entity_entity(entity_id_a, entity_id_b);
CREATE INDEX ix_b ON entity_entity(entity_id_b);

Second index doesn't need to include entity_id_a as you will use it only to select all a's within one b. RANGE SCAN on ix_b will be faster than a SKIP SCAN on ix_a_b.

Populate the table with your entities as follows:

INSERT
INTO entity_entity (entity_id_a, entity_id_b)
VALUES (LEAST(@id1, @id2), GREATEST(@id1, @id2))

Then select:

SELECT entity_id_b
FROM entity_entity
WHERE entity_id_a = @id
UNION ALL
SELECT entity_id_a
FROM entity_entity
WHERE entity_id_b = @id

UNION ALL here lets you use above indexes and avoid extra sorting for uniqueness.

All above is valid for a symmetric and anti-reflexive relationship. That means that:

  • If a is related to b, then b is related to a

  • a is never related to a

I think the structure you have suggested is fine.

To get the related records do something like

SELECT related.* FROM entities AS search 
LEFT JOIN entity_entity map ON map.entity_id_a = search.id
LEFT JOIN entities AS related ON map.entity_id_b = related.id
WHERE search.name = 'Search term'

Hope that helps.

The link table approach seems fine, except that you might want a 'relationship type' so that you know WHY they are related.

For example, the relation between Raleigh and North Carolina is not the same as a relation between Raleigh and Durham. Additionally, you may want to know who is the 'parent' in the relationship, in case you were driving conditional drop-downs. (i.e. You select a State, you get to see the cities that are in the state).

Depending on the complexity of your requirements, the simple setup you have right now may not be sufficient. If you simply need to show that two records are related in some way, the link table should be sufficient.

I already posted a way to do it in your design, but I also wanted to offer this separate design insight if you have some flexibility in your design and this more closely fits your needs.

If the items are in (non-overlapping) equivalence classes, you might want to make equivalence classes the basis for the table design, where everything in class is considered equivalent. The classes themselves can be anonymous:

CREATE TABLE equivalence_class (
    class_id int -- surrogate, IDENTITY, autonumber, etc.
    ,entity_id int
)

entity_id should be unique for a non-overlapping partition of your space.

This avoids the problem of ensuring proper left- or right-handed-ness or forcing an upper-right relationship matrix.

Then your query is a little different:

SELECT c2.entity_id
FROM equivalence_class c1
INNER JOIN equivalence_class c2
    ON c1.entity_id = @entity_id
    AND c1.class_id = c2.class_id
    AND c2.entity_id <> @entity_id

or, equivalently:

SELECT c2.entity_id
FROM equivalence_class c1
INNER JOIN equivalence_class c2
    ON c1.entity_id = @entity_id
    AND c1.class_id = c2.class_id
    AND c2.entity_id <> c1.entity_id

I can think of a few ways.

A single pass with a CASE:

SELECT DISTINCT
    CASE
        WHEN entity_id_a <> @entity_id THEN entity_id_a
        WHEN entity_id_b <> @entity_id THEN entity_id_b
    END AS equivalent_entity
FROM entity_entity
WHERE entity_id_a = @entity_id OR entity_id_b = @entity_id

Or two filtered queries UNIONed thus:

SELECT entity_id_b AS equivalent_entity
FROM entity_entity
WHERE entity_id_a = @entity_id
UNION
SELECT entity_id_a AS equivalent_entity
FROM entity_entity
WHERE entity_id_b = @entity_id
select * from entities
where entity_id in 
(
    select entity_id_b 
    from entity_entity 
    where entity_id_a = @lookup_value
)

Based on your updated schema this query should work:

select if(entity_id_a=:entity_id,entity_id_b,entity_id_a) as related_entity_id where :entity_id in (entity_id_a, entity_id_b)

where :entity_id is bound to the entity you are querying

My advice is that your intial table design is bad. Do not store different types of things in the same table. (First rule of database design, right up there with do not store multiple pieces of information in the same field). This is much harder to query and will cause significant performance problems down the road. Plus it would be a problem entering the data into the realtionship table - how do you know what entities would need to be realted when you do a new entry? It would be much better to design properly relational tables. Entity tables are almost always a bad idea. I see no reason at all from the example to have this type of information in one table. Frankly I'd have a university table and a related address table. It would easy to query and perform far better.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!