SQL JOIN - WHERE clause vs. ON clause

前端 未结 19 1572
深忆病人
深忆病人 2020-11-21 11:56

After reading it, this is not a duplicate of Explicit vs Implicit SQL Joins. The answer may be related (or even the same) but the question is diffe

相关标签:
19条回答
  • 2020-11-21 12:34

    This is a very common question, so this answer is based on this article I wrote.

    Table relationship

    Considering we have the following post and post_comment tables:

    The post has the following records:

    | id | title     |
    |----|-----------|
    | 1  | Java      |
    | 2  | Hibernate |
    | 3  | JPA       |
    

    and the post_comment has the following three rows:

    | id | review    | post_id |
    |----|-----------|---------|
    | 1  | Good      | 1       |
    | 2  | Excellent | 1       |
    | 3  | Awesome   | 2       |
    

    SQL INNER JOIN

    The SQL JOIN clause allows you to associate rows that belong to different tables. For instance, a CROSS JOIN will create a Cartesian Product containing all possible combinations of rows between the two joining tables.

    While the CROSS JOIN is useful in certain scenarios, most of the time, you want to join tables based on a specific condition. And, that's where INNER JOIN comes into play.

    The SQL INNER JOIN allows us to filter the Cartesian Product of joining two tables based on a condition that is specified via the ON clause.

    SQL INNER JOIN - ON "always true" condition

    If you provide an "always true" condition, the INNER JOIN will not filter the joined records, and the result set will contain the Cartesian Product of the two joining tables.

    For instance, if we execute the following SQL INNER JOIN query:

    SELECT
       p.id AS "p.id",
       pc.id AS "pc.id"
    FROM post p
    INNER JOIN post_comment pc ON 1 = 1
    

    We will get all combinations of post and post_comment records:

    | p.id    | pc.id      |
    |---------|------------|
    | 1       | 1          |
    | 1       | 2          |
    | 1       | 3          |
    | 2       | 1          |
    | 2       | 2          |
    | 2       | 3          |
    | 3       | 1          |
    | 3       | 2          |
    | 3       | 3          |
    

    So, if the ON clause condition is "always true", the INNER JOIN is simply equivalent to a CROSS JOIN query:

    SELECT
       p.id AS "p.id",
       pc.id AS "pc.id"
    FROM post p
    CROSS JOIN post_comment
    WHERE 1 = 1
    ORDER BY p.id, pc.id
    

    SQL INNER JOIN - ON "always false" condition

    On the other hand, if the ON clause condition is "always false", then all the joined records are going to be filtered out and the result set will be empty.

    So, if we execute the following SQL INNER JOIN query:

    SELECT
       p.id AS "p.id",
       pc.id AS "pc.id"
    FROM post p
    INNER JOIN post_comment pc ON 1 = 0
    ORDER BY p.id, pc.id
    

    We won't get any result back:

    | p.id    | pc.id      |
    |---------|------------|
    

    That's because the query above is equivalent to the following CROSS JOIN query:

    SELECT
       p.id AS "p.id",
       pc.id AS "pc.id"
    FROM post p
    CROSS JOIN post_comment
    WHERE 1 = 0
    ORDER BY p.id, pc.id
    

    SQL INNER JOIN - ON clause using the Foreign Key and Primary Key columns

    The most common ON clause condition is the one that matches the Foreign Key column in the child table with the Primary Key column in the parent table, as illustrated by the following query:

    SELECT
       p.id AS "p.id",
       pc.post_id AS "pc.post_id",
       pc.id AS "pc.id",
       p.title AS "p.title",
       pc.review  AS "pc.review"
    FROM post p
    INNER JOIN post_comment pc ON pc.post_id = p.id
    ORDER BY p.id, pc.id
    

    When executing the above SQL INNER JOIN query, we get the following result set:

    | p.id    | pc.post_id | pc.id      | p.title    | pc.review |
    |---------|------------|------------|------------|-----------|
    | 1       | 1          | 1          | Java       | Good      |
    | 1       | 1          | 2          | Java       | Excellent |
    | 2       | 2          | 3          | Hibernate  | Awesome   |
    

    So, only the records that match the ON clause condition are included in the query result set. In our case, the result set contains all the post along with their post_comment records. The post rows that have no associated post_comment are excluded since they can not satisfy the ON Clause condition.

    Again, the above SQL INNER JOIN query is equivalent to the following CROSS JOIN query:

    SELECT
       p.id AS "p.id",
       pc.post_id AS "pc.post_id",
       pc.id AS "pc.id",
       p.title AS "p.title",
       pc.review  AS "pc.review"
    FROM post p, post_comment pc
    WHERE pc.post_id = p.id
    

    The non-struck rows are the ones that satisfy the WHERE clause, and only these records are going to be included in the result set. That's the best way to visualize how the INNER JOIN clause works.

    | p.id | pc.post_id | pc.id | p.title   | pc.review |
    |------|------------|-------|-----------|-----------|
    | 1    | 1          | 1     | Java      | Good      |
    | 1    | 1          | 2     | Java      | Excellent |
    | 1    | 2          | 3     | Java      | Awesome   |
    | 2    | 1          | 1     | Hibernate | Good      |
    | 2    | 1          | 2     | Hibernate | Excellent |
    | 2    | 2          | 3     | Hibernate | Awesome   |
    | 3    | 1          | 1     | JPA       | Good      |
    | 3    | 1          | 2     | JPA       | Excellent |
    | 3    | 2          | 3     | JPA       | Awesome   |
    

    Conclusion

    An INNER JOIN statement can be rewritten as a CROSS JOIN with a WHERE clause matching the same condition you used in the ON clause of the INNER JOIN query.

    Not that this only applies to INNER JOIN, not for OUTER JOIN.

    0 讨论(0)
  • 2020-11-21 12:35

    I think this distinction can best be explained via the logical order of operations in SQL, which is, simplified:

    • FROM (including joins)
    • WHERE
    • GROUP BY
    • Aggregations
    • HAVING
    • WINDOW
    • SELECT
    • DISTINCT
    • UNION, INTERSECT, EXCEPT
    • ORDER BY
    • OFFSET
    • FETCH

    Joins are not a clause of the select statement, but an operator inside of FROM. As such, all ON clauses belonging to the corresponding JOIN operator have "already happened" logically by the time logical processing reaches the WHERE clause. This means that in the case of a LEFT JOIN, for example, the outer join's semantics has already happend by the time the WHERE clause is applied.

    I've explained the following example more in depth in this blog post. When running this query:

    SELECT a.actor_id, a.first_name, a.last_name, count(fa.film_id)
    FROM actor a
    LEFT JOIN film_actor fa ON a.actor_id = fa.actor_id
    WHERE film_id < 10
    GROUP BY a.actor_id, a.first_name, a.last_name
    ORDER BY count(fa.film_id) ASC;
    

    The LEFT JOIN doesn't really have any useful effect, because even if an actor did not play in a film, the actor will be filtered, as its FILM_ID will be NULL and the WHERE clause will filter such a row. The result is something like:

    ACTOR_ID  FIRST_NAME  LAST_NAME  COUNT
    --------------------------------------
    194       MERYL       ALLEN      1
    198       MARY        KEITEL     1
    30        SANDRA      PECK       1
    85        MINNIE      ZELLWEGER  1
    123       JULIANNE    DENCH      1
    

    I.e. just as if we inner joined the two tables. If we move the filter predicate in the ON clause, it now becomes a criteria for the outer join:

    SELECT a.actor_id, a.first_name, a.last_name, count(fa.film_id)
    FROM actor a
    LEFT JOIN film_actor fa ON a.actor_id = fa.actor_id
      AND film_id < 10
    GROUP BY a.actor_id, a.first_name, a.last_name
    ORDER BY count(fa.film_id) ASC;
    

    Meaning the result will contain actors without any films, or without any films with FILM_ID < 10

    ACTOR_ID  FIRST_NAME  LAST_NAME     COUNT
    -----------------------------------------
    3         ED          CHASE         0
    4         JENNIFER    DAVIS         0
    5         JOHNNY      LOLLOBRIGIDA  0
    6         BETTE       NICHOLSON     0
    ...
    1         PENELOPE    GUINESS       1
    200       THORA       TEMPLE        1
    2         NICK        WAHLBERG      1
    198       MARY        KEITEL        1
    

    In short

    Always put your predicate where it makes most sense, logically.

    0 讨论(0)
  • 2020-11-21 12:40

    In terms of the optimizer, it shouldn't make a difference whether you define your join clauses with ON or WHERE.

    However, IMHO, I think it's much clearer to use the ON clause when performing joins. That way you have a specific section of you query that dictates how the join is handled versus intermixed with the rest of the WHERE clauses.

    0 讨论(0)
  • 2020-11-21 12:42

    In SQL, the 'WHERE' and 'ON' clause,are kind of Conditional Statemants, but the major difference between them are, the 'Where' Clause is used in Select/Update Statements for specifying the Conditions, whereas the 'ON' Clause is used in Joins, where it verifies or checks if the Records are Matched in the target and source tables, before the Tables are Joined

    For Example: - 'WHERE'

    SELECT * FROM employee WHERE employee_id=101
    

    For Example: - 'ON'

    There are two tables employee and employee_details, the matching columns are employee_id.

    SELECT * FROM employee 
    INNER JOIN employee_details 
    ON employee.employee_id = employee_details.employee_id
    

    Hope I have answered your Question. Revert for any clarifications.

    0 讨论(0)
  • 2020-11-21 12:43

    I think it's the join sequence effect. In the upper left join case, SQL do Left join first and then do where filter. In the downer case, find Orders.ID=12345 first, and then do join.

    0 讨论(0)
  • 2020-11-21 12:47

    Regarding your question,

    It is the same both 'on' or 'where' on an inner join as long as your server can get it:

    select * from a inner join b on a.c = b.c
    

    and

    select * from a inner join b where a.c = b.c
    

    The 'where' option not all interpreters know so maybe should be avoided. And of course the 'on' clause is clearer.

    0 讨论(0)
提交回复
热议问题