What\'s the difference between:
select t1.a1, t1.a2, t1.a3 from t1 cross join t2 where t1.a3 = t2.a1
and:
select t1.a1, t1.a2,
MySQL doesn't offer a distinction between JOIN
and CROSS JOIN
. They are the same.
In both your examples the clause
WHERE t1.a3 = t2.a1
converts any sort of join into an inner join. The standard way of expressing this query is
SELECT t1.a1, t1.a2, t1.a3
FROM t1
JOIN t2 ON t1.a3 = t2.a1
There's no difference, although one can smile about it a little. You make a cross join first and with the where clause you convert it to an inner join (at least with the first one). This way of writing joins is more than 20 years old and kind of error prone in complex statements. I'd recommend to use the new funky way of writing it like this:
select t1.a1, t1.a2, t1.a3
from t1
inner join t2 on t1.a3 = t2.a1
SQL has the following types of joins, all of which come straight from set theory:
Inner join.
From A inner join B
is the equivalent of A ∩ B, providing the set of elements common to both sets.
Left outer join.
From A left outer join B
is the equivalent of (A − B) ∪ (A ∩ B). Each A will appear at least once; if there are multiple matching Bs, the A will be repeated once per matching B.
Right outer join.
From A right outer join B
is the equivalent of (A ∩ B) ∪ (B − A). It is identical to a left join with the tables trading places. Each B will appear at least once; if there are multiple matching As, each B will be repeated once per matching B.
Full outer join.
From A full outer join B
is the equivalent of (A − B) ∪ (A ∩ B) ∪ (B − A). Each A and each B will appear at least once. If an A matches multiple Bs it will be repeated once per match; if a B matches multiple As it will be repeated once per match.
Cross Join.
From A cross join B
is produces the cartesian product A × B. Each A will be repeated once for every B. If A has 100 rows and B has 100 rows, the result set will consist of 10,000 rows.
It should be noted that the theoretical execution of a select
query consists of the following steps performed in this order:
Compute the full cartesian product of the source set(s) in the from
clause to prime the candidate result set.
Apply the join criteria in the from
clause and reduce the candidate result set.
Apply the criteria in the where clause
to further reduce the candidate result set.
partition the candidate result set into groups based on the criteria in the group by
clause.
Remove from the candidate result set any columns other than those involved in the group by
clause or involved in the evaluation of an aggregate function.
Compute the value of any such aggregate functions for each group in the candidate result set.
Collapse each group in the candidate result set into a single row consisting of the grouping columns and the computed values for each aggregate function. The candidate result set now consists of one row for each group, with all columns other than the group by
columns or the compute values of aggregate functions for the group are eliminated.
Apply the criteria in the having
clause to reduce the candidate result set and produce the final result set.
Order the final result set by the criteria in order by
clause and emit it.
There are more steps, having to do with things like compute
and compute by
clauses, but this is sufficient to get the theoretical notion of how it works.
It should also be noted that nothing but the most naïve implementation would actually evaluate a select
statement this way, but the results produced must be the same as if the above steps were performed in full.