How to filter SQL results in a has-many-through relation

后端 未结 13 1458
有刺的猬
有刺的猬 2020-11-21 05:17

Assuming I have the tables student, club, and student_club:

student {
    id
    name
}
club {
    id
    name
}
stude         


        
相关标签:
13条回答
  • 2020-11-21 05:43

    Different query plans in query 2) and 10)

    I tested in a real life db, so the names differ from the catskin list. It's a backup copy, so nothing changed during all test runs (except minor changes to the catalogs).

    Query 2)

    SELECT a.*
    FROM   ef.adr a
    JOIN (
        SELECT adr_id
        FROM   ef.adratt
        WHERE  att_id IN (10,14)
        GROUP  BY adr_id
        HAVING COUNT(*) > 1) t using (adr_id);
    
    Merge Join  (cost=630.10..1248.78 rows=627 width=295) (actual time=13.025..34.726 rows=67 loops=1)
      Merge Cond: (a.adr_id = adratt.adr_id)
      ->  Index Scan using adr_pkey on adr a  (cost=0.00..523.39 rows=5767 width=295) (actual time=0.023..11.308 rows=5356 loops=1)
      ->  Sort  (cost=630.10..636.37 rows=627 width=4) (actual time=12.891..13.004 rows=67 loops=1)
            Sort Key: adratt.adr_id
            Sort Method:  quicksort  Memory: 28kB
            ->  HashAggregate  (cost=450.87..488.49 rows=627 width=4) (actual time=12.386..12.710 rows=67 loops=1)
                  Filter: (count(*) > 1)
                  ->  Bitmap Heap Scan on adratt  (cost=97.66..394.81 rows=2803 width=4) (actual time=0.245..5.958 rows=2811 loops=1)
                        Recheck Cond: (att_id = ANY ('{10,14}'::integer[]))
                        ->  Bitmap Index Scan on adratt_att_id_idx  (cost=0.00..94.86 rows=2803 width=0) (actual time=0.217..0.217 rows=2811 loops=1)
                              Index Cond: (att_id = ANY ('{10,14}'::integer[]))
    Total runtime: 34.928 ms
    

    Query 10)

    WITH two AS (
        SELECT adr_id
        FROM   ef.adratt
        WHERE  att_id IN (10,14)
        GROUP  BY adr_id
        HAVING COUNT(*) > 1
        )
    SELECT a.*
    FROM   ef.adr a
    JOIN   two using (adr_id);
    
    Hash Join  (cost=1161.52..1261.84 rows=627 width=295) (actual time=36.188..37.269 rows=67 loops=1)
      Hash Cond: (two.adr_id = a.adr_id)
      CTE two
        ->  HashAggregate  (cost=450.87..488.49 rows=627 width=4) (actual time=13.059..13.447 rows=67 loops=1)
              Filter: (count(*) > 1)
              ->  Bitmap Heap Scan on adratt  (cost=97.66..394.81 rows=2803 width=4) (actual time=0.252..6.252 rows=2811 loops=1)
                    Recheck Cond: (att_id = ANY ('{10,14}'::integer[]))
                    ->  Bitmap Index Scan on adratt_att_id_idx  (cost=0.00..94.86 rows=2803 width=0) (actual time=0.226..0.226 rows=2811 loops=1)
                          Index Cond: (att_id = ANY ('{10,14}'::integer[]))
      ->  CTE Scan on two  (cost=0.00..50.16 rows=627 width=4) (actual time=13.065..13.677 rows=67 loops=1)
      ->  Hash  (cost=384.68..384.68 rows=5767 width=295) (actual time=23.097..23.097 rows=5767 loops=1)
            Buckets: 1024  Batches: 1  Memory Usage: 1153kB
            ->  Seq Scan on adr a  (cost=0.00..384.68 rows=5767 width=295) (actual time=0.005..10.955 rows=5767 loops=1)
    Total runtime: 37.482 ms
    
    0 讨论(0)
  • 2020-11-21 05:50
    SELECT *
    FROM   student
    WHERE  id IN (SELECT student_id
                  FROM   student_club
                  WHERE  club_id = 30
                  INTERSECT
                  SELECT student_id
                  FROM   student_club
                  WHERE  club_id = 50)  
    

    Or a more general solution easier to extend to n clubs and that avoids INTERSECT (not available in MySQL) and IN (as performance of this sucks in MySQL)

    SELECT s.id,
           s.name
    FROM   student s
           join student_club sc
             ON s.id = sc.student_id
    WHERE  sc.club_id IN ( 30, 50 )
    GROUP  BY s.id,
              s.name
    HAVING COUNT(DISTINCT sc.club_id) = 2  
    
    0 讨论(0)
  • 2020-11-21 05:51

    So there's more than one way to skin a cat.
    I'll to add two more to make it, well, more complete.

    1) GROUP first, JOIN later

    Assuming a sane data model where (student_id, club_id) is unique in student_club. Martin Smith's second version is like somewhat similar, but he joins first, groups later. This should be faster:

    SELECT s.id, s.name
      FROM student s
      JOIN (
       SELECT student_id
         FROM student_club
        WHERE club_id IN (30, 50)
        GROUP BY 1
       HAVING COUNT(*) > 1
           ) sc USING (student_id);
    

    2) EXISTS

    And of course, there is the classic EXISTS. Similar to Derek's variant with IN. Simple and fast. (In MySQL, this should be quite a bit faster than the variant with IN):

    SELECT s.id, s.name
      FROM student s
     WHERE EXISTS (SELECT 1 FROM student_club
                   WHERE  student_id = s.student_id AND club_id = 30)
       AND EXISTS (SELECT 1 FROM student_club
                   WHERE  student_id = s.student_id AND club_id = 50);
    
    0 讨论(0)
  • 2020-11-21 05:52

    @erwin-brandstetter Please, benchmark this:

    SELECT s.stud_id, s.name
    FROM   student s, student_club x, student_club y
    WHERE  x.club_id = 30
    AND    s.stud_id = x.stud_id
    AND    y.club_id = 50
    AND    s.stud_id = y.stud_id;
    

    It's like number 6) by @sean , just cleaner, I guess.

    0 讨论(0)
  • 2020-11-21 05:53
    -- EXPLAIN ANALYZE
    WITH two AS (
        SELECT c0.student_id
        FROM tmp.student_club c0
        , tmp.student_club c1
        WHERE c0.student_id = c1.student_id
        AND c0.club_id = 30
        AND c1.club_id = 50
        )
    SELECT st.* FROM tmp.student st
    JOIN two ON (two.student_id=st.id)
        ;
    

    The query plan:

     Hash Join  (cost=1904.76..1919.09 rows=337 width=15) (actual time=6.937..8.771 rows=324 loops=1)
       Hash Cond: (two.student_id = st.id)
       CTE two
         ->  Hash Join  (cost=849.97..1645.76 rows=337 width=4) (actual time=4.932..6.488 rows=324 loops=1)
               Hash Cond: (c1.student_id = c0.student_id)
               ->  Bitmap Heap Scan on student_club c1  (cost=32.76..796.94 rows=1614 width=4) (actual time=0.667..1.835 rows=1646 loops=1)
                     Recheck Cond: (club_id = 50)
                     ->  Bitmap Index Scan on sc_club_id_idx  (cost=0.00..32.36 rows=1614 width=0) (actual time=0.473..0.473 rows=1646 loops=1)                     
                           Index Cond: (club_id = 50)
               ->  Hash  (cost=797.00..797.00 rows=1617 width=4) (actual time=4.203..4.203 rows=1620 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 57kB
                     ->  Bitmap Heap Scan on student_club c0  (cost=32.79..797.00 rows=1617 width=4) (actual time=0.663..3.596 rows=1620 loops=1)                   
                           Recheck Cond: (club_id = 30)
                           ->  Bitmap Index Scan on sc_club_id_idx  (cost=0.00..32.38 rows=1617 width=0) (actual time=0.469..0.469 rows=1620 loops=1)
                                 Index Cond: (club_id = 30)
       ->  CTE Scan on two  (cost=0.00..6.74 rows=337 width=4) (actual time=4.935..6.591 rows=324 loops=1)
       ->  Hash  (cost=159.00..159.00 rows=8000 width=15) (actual time=1.979..1.979 rows=8000 loops=1)
             Buckets: 1024  Batches: 1  Memory Usage: 374kB
             ->  Seq Scan on student st  (cost=0.00..159.00 rows=8000 width=15) (actual time=0.093..0.759 rows=8000 loops=1)
     Total runtime: 8.989 ms
    (20 rows)
    

    So it still seems to want the seq scan on student.

    0 讨论(0)
  • 2020-11-21 05:56
    SELECT s.*
    FROM student s
    INNER JOIN student_club sc_soccer ON s.id = sc_soccer.student_id
    INNER JOIN student_club sc_baseball ON s.id = sc_baseball.student_id
    WHERE 
     sc_baseball.club_id = 50 AND 
     sc_soccer.club_id = 30
    
    0 讨论(0)
提交回复
热议问题