Which SQL query is faster? Filter on Join criteria or Where clause?

前端 未结 9 932
伪装坚强ぢ
伪装坚强ぢ 2020-11-27 03:36

Compare these 2 queries. Is it faster to put the filter on the join criteria or in the WHERE clause. I have always felt that it is faster on the join criteria b

相关标签:
9条回答
  • 2020-11-27 03:57

    For inner joins it doesn't matter where you put your criteria. The SQL compiler will transform both into an execution plan in which the filtering occurs below the join (ie. as if the filter expressions appears is in the join condition).

    Outer joins are a different matter, since the place of the filter changes the semantics of the query.

    0 讨论(0)
  • 2020-11-27 03:59

    In postgresql they are the same. We know this because if you do explain analyze on each of the queries, the plan comes out to be the same. Take this example:

    # explain analyze select e.* from event e join result r on e.id = r.event_id and r.team_2_score=24;
    
                                                      QUERY PLAN                                                   
    ---------------------------------------------------------------------------------------------------------------
     Hash Join  (cost=27.09..38.22 rows=7 width=899) (actual time=0.045..0.047 rows=1 loops=1)
       Hash Cond: (e.id = r.event_id)
       ->  Seq Scan on event e  (cost=0.00..10.80 rows=80 width=899) (actual time=0.009..0.010 rows=2 loops=1)
       ->  Hash  (cost=27.00..27.00 rows=7 width=8) (actual time=0.017..0.017 rows=1 loops=1)
             Buckets: 1024  Batches: 1  Memory Usage: 9kB
             ->  Seq Scan on result r  (cost=0.00..27.00 rows=7 width=8) (actual time=0.006..0.008 rows=1 loops=1)
                   Filter: (team_2_score = 24)
                   Rows Removed by Filter: 1
     Planning time: 0.182 ms
     Execution time: 0.101 ms
    (10 rows)
    
    # explain analyze select e.* from event e join result r on e.id = r.event_id where r.team_2_score=24;
                                                      QUERY PLAN                                                   
    ---------------------------------------------------------------------------------------------------------------
     Hash Join  (cost=27.09..38.22 rows=7 width=899) (actual time=0.027..0.029 rows=1 loops=1)
       Hash Cond: (e.id = r.event_id)
       ->  Seq Scan on event e  (cost=0.00..10.80 rows=80 width=899) (actual time=0.010..0.011 rows=2 loops=1)
       ->  Hash  (cost=27.00..27.00 rows=7 width=8) (actual time=0.010..0.010 rows=1 loops=1)
             Buckets: 1024  Batches: 1  Memory Usage: 9kB
             ->  Seq Scan on result r  (cost=0.00..27.00 rows=7 width=8) (actual time=0.006..0.007 rows=1 loops=1)
                   Filter: (team_2_score = 24)
                   Rows Removed by Filter: 1
     Planning time: 0.140 ms
     Execution time: 0.058 ms
    (10 rows)
    

    They both have the same min and max cost as well as the same query plan. Also, notice that even in the top query the team_score_2 gets applied as a 'Filter'.

    0 讨论(0)
  • 2020-11-27 03:59

    Is it faster? Try it and see.

    Which is easier to read? The first to me looks more "correct", as the moved condition is nothing really to do with the join.

    0 讨论(0)
  • 2020-11-27 04:01

    I guess that the first, because it makes a more specific filter over the data. But you should see the execution plan, as with any optimization, because it can be very different deppending on size of data, server hardware, etc.

    0 讨论(0)
  • 2020-11-27 04:03

    Performance-wise, they are the same (and produce the same plans)

    Logically, you should make the operation that still has sense if you replace INNER JOIN with a LEFT JOIN.

    In your very case this will look like this:

    SELECT  *
    FROM    TableA a
    LEFT JOIN
            TableXRef x
    ON      x.TableAID = a.ID
            AND a.ID = 1
    LEFT JOIN
            TableB b
    ON      x.TableBID = b.ID
    

    or this:

    SELECT  *
    FROM    TableA a
    LEFT JOIN
            TableXRef x
    ON      x.TableAID = a.ID
    LEFT JOIN
            TableB b
    ON      b.id = x.TableBID
    WHERE   a.id = 1
    

    The former query will not return any actual matches for a.id other than 1, so the latter syntax (with WHERE) is logically more consistent.

    0 讨论(0)
  • 2020-11-27 04:05

    With any query optimizer worh a cent.... they are identical.

    0 讨论(0)
提交回复
热议问题