CROSS JOIN vs INNER JOIN in SQL

前端 未结 12 921
醉酒成梦
醉酒成梦 2020-11-22 03:16

What is the difference between CROSS JOIN and INNER JOIN?

CROSS JOIN:

SELECT 
    Movies.CustomerID, Movie         


        
12条回答
  •  有刺的猬
    2020-11-22 03:53

    CROSS JOIN = (INNER) JOIN = comma (",")

    TL;DR The only difference between SQL CROSS JOIN, (INNER) JOIN and comma (",") (besides comma having lower precedence for evaluation order) is that (INNER) JOIN has an ON while CROSS JOIN and comma don't.


    Re intermediate products

    All three produce an intermediate conceptual SQL-style relational "Cartesian" product, aka cross join, of all possible combinations of a row from each table. It is ON and/or WHERE that reduce the number of rows. SQL Fiddle

    The SQL Standard defines via product (7.5 1.b.ii), via (7.7 1.a) and JOIN ON via plus WHERE (7.7 1.b).

    As Wikipedia puts it:

    Cross join

    CROSS JOIN returns the Cartesian product of rows from tables in the join. In other words, it will produce rows which combine each row from the first table with each row from the second table.

    Inner join

    [...] The result of the join can be defined as the outcome of first taking the Cartesian product (or Cross join) of all records in the tables (combining every record in table A with every record in table B) and then returning all records which satisfy the join predicate.

    The "implicit join notation" simply lists the tables for joining, in the FROM clause of the SELECT statement, using commas to separate them. Thus it specifies a cross join

    Re OUTER JOINs and using ON vs WHERE in them see Conditions in LEFT JOIN (OUTER JOIN) vs INNER JOIN.

    Why compare columns between tables?

    When there are no duplicate rows:

    Every table holds the rows that make a true statement from a certain fill-in-the-[named-]blanks statement template. (It makes a true proposition from--satisfies--a certain (characteristic) predicate.)

    • A base table holds the rows that make a true statement from some DBA-given statement template:

      /* rows where
      customer C.CustomerID has age C.Age and ...
      */
      FROM Customers C
      
    • A join's intermediate product holds the rows that make a true statement from the AND of its operands' templates:

      /* rows where
          customer C.CustomerID has age C.Age and ...
      AND movie M.Movie is rented by customer M.CustomerID and ...
      */
      FROM Customers C CROSS JOIN Movies M
      
    • ON & WHERE conditions are ANDed in to give a further template. The value is again the rows that satisfy that template:

      /* rows where
          customer C.CustomerID has age C.Age and ...
      AND movie M.Movie is rented by customer M.CustomerID and ...
      AND C.CustomerID = M.CustomerID
      AND C.Age >= M.[Minimum Age]
      AND C.Age = 18
      */
      FROM Customers C INNER JOIN Movies M
      ON C.CustomerID = M.CustomerID
      AND C.Age >= M.[Minimum Age]
      WHERE C.Age = 18
      

    In particular, comparing columns for (SQL) equality between tables means that the rows kept from the product from the joined tables' parts of the template have the same (non-NULL) value for those columns. It's just coincidental that a lot of rows are typically removed by equality comparisons between tables--what is necessary and sufficient is to characterize the rows you want.

    Just write SQL for the template for the rows you want!

    Re the meaning of queries (and tables vs conditions) see:
    How to get matching data from another SQL table for two different columns: Inner Join and/or Union?
    Is there any rule of thumb to construct SQL query from a human-readable description?

    Overloading "cross join"

    Unfortunately the term "cross join" gets used for:

    • The intermediate product.
    • CROSS JOIN.
    • (INNER) JOIN with an ON or WHERE that doesn't compare any columns from one table to any columns of another. (Since that tends to return so many of the intermediate product rows.)

    These various meanings get confounded. (Eg as in other answers and comments here.)

    Using CROSS JOIN vs (INNER) JOIN vs comma

    The common convention is:

    • Use CROSS JOIN when and only when you don't compare columns between tables. That is to show that the lack of comparisons was intentional.
    • Use (INNER) JOIN with ON when and only when you compare columns between tables. (Plus possibly other conditions.)
    • Don't use comma.

    Typically also conditions not on pairs of tables are kept for a WHERE. But they may have to be put in a(n INNER) JOIN ON to get appropriate rows for the argument to a RIGHT, LEFT or FULL (OUTER) JOIN.

    Re "Don't use comma" Mixing comma with explicit JOIN can mislead because comma has lower precedence. But given the role of the intermediate product in the meaning of CROSS JOIN, (INNER) JOIN and comma, arguments for the convention above of not using it at all are shaky. A CROSS JOIN or comma is just like an (INNER) JOIN that's ON a TRUE condition. An intermediate product, ON and WHERE all introduce an AND in the corresponding predicate. However else INNER JOIN ON can be thought of--say, generating an output row only when finding a pair of input rows that satisfies the ON condition--it nevertheless returns the cross join rows that satisfy the condition. The only reason ON had to supplement comma in SQL was to write OUTER JOINs. Of course, an expression should make its meaning clear; but what is clear depends on what things are taken to mean.

    Re Venn diagrams A Venn diagram with two intersecting circles can illustrate the difference between output rows for INNER, LEFT, RIGHT & FULL JOINs for the same input. And when the ON is unconditionally TRUE, the INNER JOIN result is the same as CROSS JOIN. Also it can illustrate the input and output rows for INTERSECT, UNION & EXCEPT. And when both inputs have the same columns, the INTERSECT result is the same as for standard SQL NATURAL JOIN, and the EXCEPT result is the same as for certain idioms involving LEFT & RIGHT JOIN. But it does not illustrate how (INNER) JOIN works in general. That just seems plausible at first glance. It can identify parts of input and/or output for special cases of ON, PKs (primary keys), FKs (foreign keys) and/or SELECT. All you have to do to see this is to identify what exactly are the elements of the sets represented by the circles. (Which muddled presentations never make clear.) (Remember that in general for joins output rows have different headings from input rows. And SQL tables are bags not sets of rows with NULLs.)

提交回复
热议问题