Find ID of parent where all children exactly match

两盒软妹~` 提交于 2021-01-27 14:50:37

问题


The Scenario

Let's suppose we have a set of database tables that represent four key concepts:

  1. Entity Types (e.g. account, client, etc.)
  2. Entities (e.g. instances of the above Entity Types)
  3. Cohorts (a named group)
  4. Cohort Members (the Entities that form up the membership of a Cohort)

The rules around Cohorts are:

  1. A Cohort always has at least one Cohort Member.
  2. A Cohorts Members must be unique to that Cohort (i.e. Entity 5 cannot be a member of Cohort 3 twice, though it could be a member of Cohort 3 and Cohort 4)
  3. No two Cohorts will ever be entirely equal in membership, though one Cohort may legitimately be a subset of another Cohort.

The rules around Entities are:

  1. No two Entities may have the same value pair (business_key, entity_type_id)
  2. Two entities with a different entity_type_id may share a business_key

Because pictures tell a thousand lines of code, here is the ERD:


The Question

I want a SQL query that, when provided a collection of (business_key, entity_type_id) pairs, will search for a Cohort that matches exactly, returning one row with just the cohort_id if that Cohort exists, and zero rows otherwise.

i.e. - if the set of Entities matchesentity_ids 1 and 2, it will only return a cohort_id where the cohort_members are exactly 1 and 2, not just 1, not just 2, not a cohort with entity_ids 1 2 and 3. If no cohort exists that satisfies this, then zero rows are returned.


The Test Cases

To help people addressing the question, I have created a fiddle of the tables along with some data that defines various Entity Types, Entities, and Cohorts. There is also a table with test data for matching, named test_cohort. It contains 6 test cohorts which test various scenarios. The first 5 tests should exactly match just one cohort. The 6th test is a bogus one to test the zero-row clause. When using the test table, the associated INSERT statement should just have one line uncommented (see fiddle, it's set up like that initially):

http://sqlfiddle.com/#!18/2d022

My attempt in SQL is the following, though it fails tests #2 and #4 (which can be found in the fiddle):

SELECT actual_cohort_member.cohort_id
FROM test_cohort
INNER JOIN entity
    ON entity.business_key = test_cohort.business_key
    AND entity.entity_type_id = test_cohort.entity_type_id
INNER JOIN cohort_member AS existing_potential_member
    ON existing_potential_member.entity_id = entity.entity_id
INNER JOIN cohort
    ON cohort.cohort_id = existing_potential_member.cohort_id
RIGHT OUTER JOIN cohort_member AS actual_cohort_member
    ON actual_cohort_member.cohort_id = cohort.cohort_id
    AND actual_cohort_member.cohort_id = existing_potential_member.cohort_id
    AND actual_cohort_member.entity_id = existing_potential_member.entity_id
GROUP BY actual_cohort_member.cohort_id
HAVING
    SUM(CASE WHEN
        actual_cohort_member.cohort_id = existing_potential_member.cohort_id AND
        actual_cohort_member.entity_id = existing_potential_member.entity_id THEN 1 ELSE 0
    END) = COUNT(*)
;

回答1:


This scenario can be achieve by adding compound condition in the WHERE clause since you're comparing to a pair value. Then you have to count the result based from the conditions set in the WHERE clause as well as the total rows by of the cohort_id.

SELECT  c.cohort_id
FROM    cohort c
        INNER JOIN cohort_member cm
            ON c.cohort_id = cm.cohort_id
        INNER JOIN entity e
            ON cm.entity_id = e.entity_id
WHERE   (e.entity_type_id = 1 AND e.business_key = 'acc1')      -- condition here
         OR (e.entity_type_id = 1 AND e.business_key = 'acc2')
GROUP   BY c.cohort_id
HAVING  COUNT(*) = 2                                            -- number must be the same to the total number of condition
        AND (SELECT COUNT(*) 
             FROM cohort_member cm2 
             WHERE cm2.cohort_id = c.cohort_id) = 2             -- number must be the same to the total number of condition
  • Test Case #1
  • Test Case #2
  • Test Case #3
  • Test Case #4
  • Test Case #5
  • Test Case #6

As you can see in the test cases above, the value in the filter depends on the number of conditions in the WHERE clause. It would be advisable to create a dynamic query on this.

UPDATE

If the table test_cohort contains only one scenario, then this will suffice your requirement, however, if test_cohort contains list of scenarios then you might want to look in the other answer since this solution does not alter any table schema.

SELECT  c.cohort_id
FROM    cohort c
        INNER JOIN cohort_member cm
            ON c.cohort_id = cm.cohort_id
        INNER JOIN entity e
            ON cm.entity_id = e.entity_id
        INNER JOIN test_cohort tc
            ON tc.business_key = e.business_key
                AND tc.entity_type_id = e.entity_type_id
GROUP   BY c.cohort_id
HAVING  COUNT(*) = (SELECT COUNT(*) FROM test_cohort)
        AND (SELECT COUNT(*) 
             FROM cohort_member cm2 
             WHERE cm2.cohort_id = c.cohort_id) = (SELECT COUNT(*) FROM test_cohort)
  • Test Case #1
  • Test Case #2
  • Test Case #3
  • Test Case #4
  • Test Case #5
  • Test Case #6



回答2:


I have added a column i to your test_cohort table, so that you can test all your scenarios at the same time. Here is a DDL

CREATE TABLE test_cohort (
i int,
business_key NVARCHAR(255),
entity_type_id INT
);

INSERT INTO test_cohort VALUES
(1, 'acc1', 1), (1, 'acc2', 1) -- TEST #1: should match against cohort 1
,(2, 'cli1', 2), (2, 'cli2', 2) -- TEST #2: should match against cohort 2
,(3, 'cli1', 2) -- TEST #3: should match against cohort 3
,(4, 'acc1', 1), (4, 'acc2', 1), (4, 'cli1', 2), (4, 'cli2', 2) -- TEST #4: should match against cohort 4
,(5, 'acc1', 1), (5, 'cli2', 2) -- TEST #5: should match against cohort 5
,(6, 'acc1', 3), (6, 'cli2', 3) -- TEST #6: should not match any cohort

And the query:

select
    c.i, m.cohort_id
from
    (
        select 
            *, cnt = count(*) over (partition by i)
        from 
            test_cohort
    ) c
    join entity e on c.entity_type_id = e.entity_type_id and c.business_key = e.business_key
    join (
        select
            *, cnt = count(*) over (partition by cohort_id)
        from
            cohort_member
    ) m on e.entity_id = m.entity_id and c.cnt = m.cnt
group by m.cohort_id, c.cnt, c.i
having count(*) = c.cnt

Output

i   cohort_id
------------
1   1
2   2
3   3
4   4
5   5

The idea is to count number of rows before join. And compare by exact match



来源:https://stackoverflow.com/questions/48699160/find-id-of-parent-where-all-children-exactly-match

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!