SQL JOIN - WHERE clause vs. ON clause

前端 未结 19 1597
深忆病人
深忆病人 2020-11-21 11:56

After reading it, this is not a duplicate of Explicit vs Implicit SQL Joins. The answer may be related (or even the same) but the question is diffe

相关标签:
19条回答
  • 2020-11-21 12:30
    • Does not matter for inner joins

    • Matters for outer joins

      a. WHERE clause: After joining. Records will be filtered after join has taken place.

      b. ON clause - Before joining. Records (from right table) will be filtered before joining. This may end up as null in the result (since OUTER join).



    Example: Consider the below tables:

        1. documents:
         | id    | name        |
         --------|-------------|
         | 1     | Document1   |
         | 2     | Document2   |
         | 3     | Document3   |
         | 4     | Document4   |
         | 5     | Document5   |
    
    
        2. downloads:
         | id   | document_id   | username |
         |------|---------------|----------|
         | 1    | 1             | sandeep  |
         | 2    | 1             | simi     |
         | 3    | 2             | sandeep  |
         | 4    | 2             | reya     |
         | 5    | 3             | simi     |
    

    a) Inside WHERE clause:

      SELECT documents.name, downloads.id
        FROM documents
        LEFT OUTER JOIN downloads
          ON documents.id = downloads.document_id
        WHERE username = 'sandeep'
    
     For above query the intermediate join table will look like this.
    
        | id(from documents) | name         | id (from downloads) | document_id | username |
        |--------------------|--------------|---------------------|-------------|----------|
        | 1                  | Document1    | 1                   | 1           | sandeep  |
        | 1                  | Document1    | 2                   | 1           | simi     |
        | 2                  | Document2    | 3                   | 2           | sandeep  |
        | 2                  | Document2    | 4                   | 2           | reya     |
        | 3                  | Document3    | 5                   | 3           | simi     |
        | 4                  | Document4    | NULL                | NULL        | NULL     |
        | 5                  | Document5    | NULL                | NULL        | NULL     |
    
      After applying the `WHERE` clause and selecting the listed attributes, the result will be: 
    
       | name         | id |
       |--------------|----|
       | Document1    | 1  |
       | Document2    | 3  | 
    

    b) Inside JOIN clause

      SELECT documents.name, downloads.id
      FROM documents
        LEFT OUTER JOIN downloads
          ON documents.id = downloads.document_id
            AND username = 'sandeep'
    
    For above query the intermediate join table will look like this.
    
        | id(from documents) | name         | id (from downloads) | document_id | username |
        |--------------------|--------------|---------------------|-------------|----------|
        | 1                  | Document1    | 1                   | 1           | sandeep  |
        | 2                  | Document2    | 3                   | 2           | sandeep  |
        | 3                  | Document3    | NULL                | NULL        | NULL     |
        | 4                  | Document4    | NULL                | NULL        | NULL     |
        | 5                  | Document5    | NULL                | NULL        | NULL     |
    
    Notice how the rows in `documents` that did not match both the conditions are populated with `NULL` values.
    
    After Selecting the listed attributes, the result will be: 
    
       | name       | id   |
       |------------|------|
       |  Document1 | 1    |
       |  Document2 | 3    | 
       |  Document3 | NULL |
       |  Document4 | NULL | 
       |  Document5 | NULL | 
    
    0 讨论(0)
  • 2020-11-21 12:30

    They are equivalent, literally.

    In most open-source databases (most notable examples, in MySql and postgresql) the query planning is a variant of the classic algorithm appearing in Access Path Selection in a Relational Database Management System (Selinger et al, 1979). In this approach, the conditions are of two types

    • conditions referring to a single table (used for filtering)
    • conditions referring to two tables (treated as join conditions, regardless of where they appear)

    Especially in MySql, you can see yourself, by tracing the optimizer, that the join .. on conditions are replaced during parsing by the equivalent where conditions. A similar thing happens in postgresql (though there's no way to see it through a log, you have to read the source description).

    Anyway, the main point is, the difference between the two syntax variants is lost during the parsing/query-rewriting phase, it does not even reach the query planning and execution phase. So, there's no question about whether they are equivalent in terms of performance, they become identical long before they reach the execution phase.

    You can use explain, to verify that they produce identical plans. Eg, in postgres, the plan will contain a join clause, even if you didn't use the join..on syntax anywhere.

    Oracle and SQL server are not open source, but, as far as I know, they are based equivalence rules (similar to those in relational algebra), and they also produce identical execution plans in both cases.

    Obviously, the two syntax styles are not equivalent for outer joins, for those you have to use the join ... on syntax

    0 讨论(0)
  • 2020-11-21 12:31

    On an inner join, they mean the same thing. However you will get different results in an outer join depending on if you put the join condition in the WHERE vs the ON clause. Take a look at this related question and this answer (by me).

    I think it makes the most sense to be in the habit of always putting the join condition in the ON clause (unless it is an outer join and you actually do want it in the where clause) as it makes it clearer to anyone reading your query what conditions the tables are being joined on, and also it helps prevent the WHERE clause from being dozens of lines long.

    0 讨论(0)
  • 2020-11-21 12:32

    Normally, filtering is processed in the WHERE clause once the two tables have already been joined. It’s possible, though that you might want to filter one or both of the tables before joining them. i.e, the where clause applies to the whole result set whereas the on clause only applies to the join in question.

    0 讨论(0)
  • 2020-11-21 12:33

    The way I do it is:

    • Always put the join conditions in the ON clause if you are doing an INNER JOIN. So, do not add any WHERE conditions to the ON clause, put them in the WHERE clause.

    • If you are doing a LEFT JOIN, add any WHERE conditions to the ON clause for the table in the right side of the join. This is a must, because adding a WHERE clause that references the right side of the join will convert the join to an INNER JOIN.

      The exception is when you are looking for the records that are not in a particular table. You would add the reference to a unique identifier (that is not ever NULL) in the RIGHT JOIN table to the WHERE clause this way: WHERE t2.idfield IS NULL. So, the only time you should reference a table on the right side of the join is to find those records which are not in the table.

    0 讨论(0)
  • 2020-11-21 12:33

    for better performance tables should have a special indexed column to use for JOINS .

    so if the column you condition on is not one of those indexed columns then i suspect it is better to keep it in WHERE .

    so you JOIN using the indexed columns, then after JOIN you run the condition on the none indexed column .

    0 讨论(0)
提交回复
热议问题