I am learning SQL and am trying to learn JOINs this week.
I have gotten to the level where I can do three table joins, similar to a lot of examples I\'ve seen. I\'
Yes, I do use all three of those JOINs, although I tend to stick to using just LEFT (OUTER) JOIN
s instead of inter-mixing LEFT and RIGHT JOINs. I also use FULL OUTER JOIN
s and CROSS JOIN
s.
In summary, an INNER JOIN
restricts the resultset only to those records satisfied by the JOIN condition. Consider the following tables
EDIT: I've renamed the Table names and prefix them with @
so that Table Variables can be used for anyone reading this answer and wanting to experiment.
If you'd also like to experiment with this in the browser, I've set this all up on SQL Fiddle too;
@Table1
id | name
---------
1 | One
2 | Two
3 | Three
4 | Four
@Table2
id | name
---------
1 | Partridge
2 | Turtle Doves
3 | French Hens
5 | Gold Rings
SQL code
DECLARE @Table1 TABLE (id INT PRIMARY KEY CLUSTERED, [name] VARCHAR(25))
INSERT INTO @Table1 VALUES(1, 'One');
INSERT INTO @Table1 VALUES(2, 'Two');
INSERT INTO @Table1 VALUES(3, 'Three');
INSERT INTO @Table1 VALUES(4, 'Four');
DECLARE @Table2 TABLE (id INT PRIMARY KEY CLUSTERED, [name] VARCHAR(25))
INSERT INTO @Table2 VALUES(1, 'Partridge');
INSERT INTO @Table2 VALUES(2, 'Turtle Doves');
INSERT INTO @Table2 VALUES(3, 'French Hens');
INSERT INTO @Table2 VALUES(5, 'Gold Rings');
An INNER JOIN
SQL Statement, joined on the id
field
SELECT
t1.id,
t1.name,
t2.name
FROM
@Table1 t1
INNER JOIN
@Table2 t2
ON
t1.id = t2.id
Results in
id | name | name
----------------
1 | One | Partridge
2 | Two | Turtle Doves
3 | Three| French Hens
A LEFT JOIN
will return a resultset with all records from the table on the left hand side of the join (if you were to write out the statement as a one liner, the table that appears first) and fields from the table on the right side of the join that match the join expression and are included in the SELECT
clause. Missing details will be populated with NULL
SELECT
t1.id,
t1.name,
t2.name
FROM
@Table1 t1
LEFT JOIN
@Table2 t2
ON
t1.id = t2.id
Results in
id | name | name
----------------
1 | One | Partridge
2 | Two | Turtle Doves
3 | Three| French Hens
4 | Four | NULL
A RIGHT JOIN
is the same logic as a LEFT JOIN
but will return all records from the right-hand side of the join and fields from the left side that match the join expression and are included in the SELECT
clause.
SELECT
t1.id,
t1.name,
t2.name
FROM
@Table1 t1
RIGHT JOIN
@Table2 t2
ON
t1.id = t2.id
Results in
id | name | name
----------------
1 | One | Partridge
2 | Two | Turtle Doves
3 | Three| French Hens
NULL| NULL| Gold Rings
Of course, there is also the FULL OUTER JOIN
, which includes records from both joined tables and populates any missing details with NULL.
SELECT
t1.id,
t1.name,
t2.name
FROM
@Table1 t1
FULL OUTER JOIN
@Table2 t2
ON
t1.id = t2.id
Results in
id | name | name
----------------
1 | One | Partridge
2 | Two | Turtle Doves
3 | Three| French Hens
4 | Four | NULL
NULL| NULL| Gold Rings
And a CROSS JOIN
(also known as a CARTESIAN PRODUCT
), which is simply the product of cross applying fields in the SELECT
statement from one table with the fields in the SELECT
statement from the other table. Notice that there is no join expression in a CROSS JOIN
SELECT
t1.id,
t1.name,
t2.name
FROM
@Table1 t1
CROSS JOIN
@Table2 t2
Results in
id | name | name
------------------
1 | One | Partridge
2 | Two | Partridge
3 | Three | Partridge
4 | Four | Partridge
1 | One | Turtle Doves
2 | Two | Turtle Doves
3 | Three | Turtle Doves
4 | Four | Turtle Doves
1 | One | French Hens
2 | Two | French Hens
3 | Three | French Hens
4 | Four | French Hens
1 | One | Gold Rings
2 | Two | Gold Rings
3 | Three | Gold Rings
4 | Four | Gold Rings
EDIT:
Imagine there is now a Table3
@Table3
id | name
---------
2 | Prime 1
3 | Prime 2
5 | Prime 3
The SQL code
DECLARE @Table3 TABLE (id INT PRIMARY KEY CLUSTERED, [name] VARCHAR(25))
INSERT INTO @Table3 VALUES(2, 'Prime 1');
INSERT INTO @Table3 VALUES(3, 'Prime 2');
INSERT INTO @Table3 VALUES(5, 'Prime 3');
Now all three tables joined with INNER JOINS
SELECT
t1.id,
t1.name,
t2.name,
t3.name
FROM
@Table1 t1
INNER JOIN
@Table2 t2
ON
t1.id = t2.id
INNER JOIN
@Table3 t3
ON
t1.id = t3.id
Results in
id | name | name | name
-------------------------------
2 | Two | Turtle Doves | Prime 1
3 | Three| French Hens | Prime 2
It might help to understand this result by thinking that records with id 2 and 3 are the only ones common to all 3 tables and are also the field we are joining each table on.
Now all three with LEFT JOINS
SELECT
t1.id,
t1.name,
t2.name,
t3.name
FROM
@Table1 t1
LEFT JOIN
@Table2 t2
ON
t1.id = t2.id
LEFT JOIN
@Table3 t3
ON
t1.id = t3.id
Results in
id | name | name | name
-------------------------------
1 | One | Partridge | NULL
2 | Two | Turtle Doves | Prime 1
3 | Three| French Hens | Prime 2
4 | Four | NULL | NULL
Joel's answer is a good explanation for explaining this resultset (Table1 is the base/origin table).
Now with a INNER JOIN
and a LEFT JOIN
SELECT
t1.id,
t1.name,
t2.name,
t3.name
FROM
@Table1 t1
INNER JOIN
@Table2 t2
ON
t1.id = t2.id
LEFT JOIN
@Table3 t3
ON
t1.id = t3.id
Results in
id | name | name | name
-------------------------------
1 | One | Partridge | NULL
2 | Two | Turtle Doves | Prime 1
3 | Three| French Hens | Prime 2
Although we do not know the order in which the query optimiser will perform the operations, we will look at this query from top to bottom to understand the resultset. The INNER JOIN
on ids between Table1 and Table2 will restrict the resultset to only those records satisfied by the join condition i.e. the three rows that we saw in the very first example. This temporary resultset will then be LEFT JOIN
ed to Table3 on ids between Table1 and Tables; There are records in Table3 with id 2 and 3, but not id 1, so t3.name field will have details in for 2 and 3 but not 1.
Joins are just ways of combining tables. Joining three tables is no different than joining 2... or 200. You can mix and match INNER, [LEFT/RIGHT/FULL] OUTER, and even CROSS joins as much as you want. The only difference is which results are kept: INNER joins only keep rows where both sides match the expression. OUTER joins pick an "origin" table depending on the LEFT/RIGHT/FULL specification, always keep all rows from the origin table, and supply NULL values for rows from the other side that don't match the expression. CROSS joins return all possible combinations of both sides.
The trick is that because you're working with declarative code rather than more-familiar iterative, the temptation is to try to think of it as if everything happens at once. When you do that, you try to wrap your head around the entire query and it can get confusing.
Instead, you want to think of it as if the joins happen in order, from the first table listed to the last. This actually is not how it works, because the query optimizer can re-order things to make them run faster. But it makes building the query easier for the developer.
So with three tables, you start with your base table, then join in the values you need from the next table, and the next, and so on, just like adding lines of code to a function to produce the required output.
As for using the different join types, I've used all the different types I listed here: INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER, and CROSS. But most of those you only need to use occasionally. INNER JOIN and LEFT JOIN will cover probably 95% or more of what you want to do.
Now let's talk about performance. Often times the order you list tables is dictated to you: you start from TableA
and you need to list TableB
first in order to have access to columns required to join in TableC
. But sometimes both TableB
and TableC
only depend on TableA
, and you could list them in either order. When that happens the query optimizer will usually pick the best order for you, but sometimes it doesn't know how. Even if it did, it helps to have a good system for listing tables so you can always look at a query and know that it's "right".
With that in mind, you should think of a query in terms of the working set
currently in memory as the query builds. When you start with TableA
, the database looks at all the columns from TableA
in the select list or anywhere else (like WHERE or ORDER BY clauses, or potential indexes) in the query, factors in relevant conditions from the WHERE clause, and loads the smallest portion of that table into memory that it can get away with. It does this for each table in turn, always loading as little as possible. And that's the key: you want to keep this working set as small as possible for as long as possible.
So, going back to our three-table join, we want to list the tables in the order that will keep the working set smaller for longer. This means listing the smaller table above the larger one. Another good rule of thumb is that INNER joins tend to shrink result sets, while OUTER joins, tend to grow result sets, and so you want to list your INNER joins first. However, this is not a requirement for a query to work, nor is it always true; sometimes the reverse can happen as well.
Finally, I want to point out again that this isn't how it really works. The query optimizer and execution plans are a very complex topic, and there are lots of tricks the database can take that break this model from time to time. It's just one model that you as a developer can use to help understand what the server is doing and help you write better queries.