Here is my table structure: SQL Fiddle
CREATE TABLE mytable (
id int,
related int
);
INSERT into mytable VALUES(1, NULL);
INSERT into mytable VALUES
In first case, we need to match values 1, 2, 3 with NULL
, 1 and 1. Since it is left join
, NULL
will stay with no match and 1's will be matched with 1 from other table, thus 3 records.
In the second case, we have values 1, 2, 3. 2 and 3 have no match and will result in two rows, but 1 has 2 matches and will result in 2 additional rows, which is 4 rows.
Generally, having:
... LeftTable [LT] left join RightTable [RT] on [LT].[joinCol] = [RT].pjoinCol] ...
will work like this:
take all values from LT.joinCol
, try to match with values in RT.joinCol
. If some value has n
matches in RT.joinCol
, then it will result in n
rows. If the row has no match, it will still result in one, un-matched record.
In your 1st case, 2 values have 1 match => 1 + 1 = 2
records. One value has no match => 1 record, 2 + 1 = 3
.
In your 2nd case, 2 values have no match => thus 2 records, one value has 2 matches => 2 records, 2 + 2 = 4
:)
LEFT JOIN return all tuple from left table even if no matches in right table plus the matched value of right table.
SELECT t1.id as t1_id, t1.related as t1_r, t2.id as t2_id, t2.related as t2_r
FROM mytable as t1
LEFT JOIN mytable as t2 ON t1.related = t2.id;
returns
t1_id t1_r t2_id t2_r
----------------------------
1 null null null
2 1 1 null
4 1 1 null
(1, null) tuple of t1 matches no tuple from t2, (2, 1) tuple of t1 matches one tuple of t2 (1, null), so as (4, 1) hence 3 rows in result
Where as
SELECT t1.id as t1_id, t1.related as t1_r, t2.id as t2_id, t2.related as t2_r
FROM test1 as t1
LEFT JOIN test1 as t2 ON t1.id = t2.related ;
returns
t1_id t1_r t2_id t2_r
-----------------------------
1 null 2 1
1 null 4 1
2 1 null null
4 1 null null
here (1, null) of t1 matches two tuple of t2 (2, 1) and (4, 1) and (2, 1) and (4, 1) matches no tuple hence 4 rows
LEFT JOIN
means to grab all the rows from the left table, and only if there is a match from the right table, to return those. If there is no match, NULL
is returned.
Let's look at what the dataset looks like when joined to itself with no condition. (Note, the asterisks and pluses are referred to below.)
+-------+------------+-------+------------+
| t1.id | t1.related | t2.id | t2.related |
+-------+------------+-------+------------+
| 1 | NULL | 1 | NULL |
+| 1 | NULL | 2 | 1 |
+| 1 | NULL | 4 | 1 |
*| 2 | 1 | 1 | NULL |
| 2 | 1 | 2 | 1 |
| 2 | 1 | 4 | 1 |
*| 4 | 1 | 1 | NULL |
| 4 | 1 | 2 | 1 |
| 4 | 1 | 4 | 1 |
+-------+------------+-------+------------+
The left table is t1
in both cases.
In the first query, we look for matches of t1.related = t2.id
(the middle two columns). That corresponds to the rows marked with an asterisk. However, there was no match for t.id = 1
, so we need to include this, but with NULL
because that's what LEFT JOIN
means (no match still returns NULL
).
+-------+-------+
| t1.id | t2.id |
+-------+-------+
| 1 | NULL | added because LEFT JOIN
| 2 | 1 | marked with * in table above
| 4 | 1 | marked with * in table above
+-------+-------+
In the second query, we look for matches of t1.id = t2.related
(the outer two columns). That corresponds to the rows marked with a plus. However, there was no match for t.id = 2
and t.id = 4
, so we need to include these, but with NULL
because that's what LEFT JOIN
means (no match still returns NULL
).
+-------+-------+
| t1.id | t2.id |
+-------+-------+
| 1 | 2 | marked with + in table above
| 1 | 4 | marked with + in table above
| 2 | NULL | added because LEFT JOIN
| 4 | NULL | added because LEFT JOIN
+-------+-------+
First query: t1.related = t2.id
t1 joined t2 id related | id related --------------+------------- 1 NULL | -- -- 2 1 | 1 NULL 3 1 | 1 NULL
An inner join would result in only two rows, but the outer join also preserves the first row that has no match.
Second query: t1.id = t2.related
t1 joined t2 id related | id related --------------+------------- 1 NULL | 2 1 1 NULL | 3 1 2 1 | -- -- 3 1 | -- --
Here too, an inner join would result in only two rows, but the outer join also preserves the two rows that have no match.
The best way to view self join is create two tables and then view the joining conditions.
Table t1
Id Related
1 null
2 1
4 1
Table t2
Id Related
1 null
2 1
4 1
Note: Left Join means every thing from left table will come even if joining condition does not match. From the right table it will come as null.
First Query: t1.related = t2.id; (Columns selected "t1.id, t2.id")
1.) Lets take first row from t1 table and related column has null value. null has not match in id column of t2 table. As it is a left join, row will come from t1 table.
First Row:
t1_id t2_id
1 null
2.) Lets take second row from t1 table and related column has 1. 1 has one match in id column of t2 table. So one row comes in join condition.
Second Row:
t1_id t2_id
2 1
3.) Lets take third row from t1 table and related column has 1. 1 has one match in id column of t2 table. So one row comes in join condition.
Third Row:
t1_id t2_id
4 1
Second Query t1.id = t2.related (Columns selected "t1.id, t2.id")
1.) Lets take first row from t1 table and id column has 1. 1 has 2 rows in related column of t2 table. so two rows are selected.
t1.id t2.id
1 2
1 4
2.) Lets take second row from t1 table and id column has 2. 2 has 0 row in related column of t2 table. But it a left join row will come fro t1 table.
t1.id t2.id
1 2
1 4
2 null
2.) Lets take third row from t1 table and id column has 4. 4 has 0 row in related column of t2 table. But it a left join row will come fro t1 table.
t1.id t2.id
1 2
1 4
2 null
4 null
Hope this will make you understand.
Thanks Ankit.