I would like to know the following:
This answer covers:
There are a number of ways to retrieve data from multiple tables in a database. In this answer, I will be using ANSI-92 join syntax. This may be different to a number of other tutorials out there which use the older ANSI-89 syntax (and if you are used to 89, may seem much less intuitive - but all I can say is to try it) as it is much easier to understand when the queries start getting more complex. Why use it? Is there a performance gain? The short answer is no, but it is easier to read once you get used to it. It is easier to read queries written by other folks using this syntax.
I am also going to use the concept of a small caryard which has a database to keep track of what cars it has available. The owner has hired you as his IT Computer guy and expects you to be able to drop him the data that he asks for at the drop of a hat.
I have made a number of lookup tables that will be used by the final table. This will give us a reasonable model to work from. To start off, I will be running my queries against an example database that has the following structure. I will try to think of common mistakes that are made when starting out and explain what goes wrong with them - as well as of course showing how to correct them.
The first table is simply a color listing so that we know what colors we have in the car yard.
mysql> create table colors(id int(3) not null auto_increment primary key,
-> color varchar(15), paint varchar(10));
Query OK, 0 rows affected (0.01 sec)
mysql> show columns from colors;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(3) | NO | PRI | NULL | auto_increment |
| color | varchar(15) | YES | | NULL | |
| paint | varchar(10) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
3 rows in set (0.01 sec)
mysql> insert into colors (color, paint) values ('Red', 'Metallic'),
-> ('Green', 'Gloss'), ('Blue', 'Metallic'),
-> ('White' 'Gloss'), ('Black' 'Gloss');
Query OK, 5 rows affected (0.00 sec)
Records: 5 Duplicates: 0 Warnings: 0
mysql> select * from colors;
+----+-------+----------+
| id | color | paint |
+----+-------+----------+
| 1 | Red | Metallic |
| 2 | Green | Gloss |
| 3 | Blue | Metallic |
| 4 | White | Gloss |
| 5 | Black | Gloss |
+----+-------+----------+
5 rows in set (0.00 sec)
The brands table identifies the different brands of the cars out caryard could possibly sell.
mysql> create table brands (id int(3) not null auto_increment primary key,
-> brand varchar(15));
Query OK, 0 rows affected (0.01 sec)
mysql> show columns from brands;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(3) | NO | PRI | NULL | auto_increment |
| brand | varchar(15) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
2 rows in set (0.01 sec)
mysql> insert into brands (brand) values ('Ford'), ('Toyota'),
-> ('Nissan'), ('Smart'), ('BMW');
Query OK, 5 rows affected (0.00 sec)
Records: 5 Duplicates: 0 Warnings: 0
mysql> select * from brands;
+----+--------+
| id | brand |
+----+--------+
| 1 | Ford |
| 2 | Toyota |
| 3 | Nissan |
| 4 | Smart |
| 5 | BMW |
+----+--------+
5 rows in set (0.00 sec)
The model table will cover off different types of cars, it is going to be simpler for this to use different car types rather than actual car models.
mysql> create table models (id int(3) not null auto_increment primary key,
-> model varchar(15));
Query OK, 0 rows affected (0.01 sec)
mysql> show columns from models;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(3) | NO | PRI | NULL | auto_increment |
| model | varchar(15) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)
mysql> insert into models (model) values ('Sports'), ('Sedan'), ('4WD'), ('Luxury');
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> select * from models;
+----+--------+
| id | model |
+----+--------+
| 1 | Sports |
| 2 | Sedan |
| 3 | 4WD |
| 4 | Luxury |
+----+--------+
4 rows in set (0.00 sec)
And finally, to tie up all these other tables, the table that ties everything together. The ID field is actually the unique lot number used to identify cars.
mysql> create table cars (id int(3) not null auto_increment primary key,
-> color int(3), brand int(3), model int(3));
Query OK, 0 rows affected (0.01 sec)
mysql> show columns from cars;
+-------+--------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------+------+-----+---------+----------------+
| id | int(3) | NO | PRI | NULL | auto_increment |
| color | int(3) | YES | | NULL | |
| brand | int(3) | YES | | NULL | |
| model | int(3) | YES | | NULL | |
+-------+--------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
mysql> insert into cars (color, brand, model) values (1,2,1), (3,1,2), (5,3,1),
-> (4,4,2), (2,2,3), (3,5,4), (4,1,3), (2,2,1), (5,2,3), (4,5,1);
Query OK, 10 rows affected (0.00 sec)
Records: 10 Duplicates: 0 Warnings: 0
mysql> select * from cars;
+----+-------+-------+-------+
| id | color | brand | model |
+----+-------+-------+-------+
| 1 | 1 | 2 | 1 |
| 2 | 3 | 1 | 2 |
| 3 | 5 | 3 | 1 |
| 4 | 4 | 4 | 2 |
| 5 | 2 | 2 | 3 |
| 6 | 3 | 5 | 4 |
| 7 | 4 | 1 | 3 |
| 8 | 2 | 2 | 1 |
| 9 | 5 | 2 | 3 |
| 10 | 4 | 5 | 1 |
+----+-------+-------+-------+
10 rows in set (0.00 sec)
This will give us enough data (I hope) to cover off the examples below of different types of joins and also give enough data to make them worthwhile.
So getting into the grit of it, the boss wants to know The IDs of all the sports cars he has.
This is a simple two table join. We have a table that identifies the model and the table with the available stock in it. As you can see, the data in the model
column of the cars
table relates to the models
column of the cars
table we have. Now, we know that the models table has an ID of 1
for Sports
so lets write the join.
select
ID,
model
from
cars
join models
on model=ID
So this query looks good right? We have identified the two tables and contain the information we need and use a join that correctly identifies what columns to join on.
ERROR 1052 (23000): Column 'ID' in field list is ambiguous
Oh noes! An error in our first query! Yes, and it is a plum. You see, the query has indeed got the right columns, but some of them exist in both tables, so the database gets confused about what actual column we mean and where. There are two solutions to solve this. The first is nice and simple, we can use tableName.columnName
to tell the database exactly what we mean, like this:
select
cars.ID,
models.model
from
cars
join models
on cars.model=models.ID
+----+--------+
| ID | model |
+----+--------+
| 1 | Sports |
| 3 | Sports |
| 8 | Sports |
| 10 | Sports |
| 2 | Sedan |
| 4 | Sedan |
| 5 | 4WD |
| 7 | 4WD |
| 9 | 4WD |
| 6 | Luxury |
+----+--------+
10 rows in set (0.00 sec)
The other is probably more often used and is called table aliasing. The tables in this example have nice and short simple names, but typing out something like KPI_DAILY_SALES_BY_DEPARTMENT
would probably get old quickly, so a simple way is to nickname the table like this:
select
a.ID,
b.model
from
cars a
join models b
on a.model=b.ID
Now, back to the request. As you can see we have the information we need, but we also have information that wasn't asked for, so we need to include a where clause in the statement to only get the Sports cars as was asked. As I prefer the table alias method rather than using the table names over and over, I will stick to it from this point onwards.
Clearly, we need to add a where clause to our query. We can identify Sports cars either by ID=1
or model='Sports'
. As the ID is indexed and the primary key (and it happens to be less typing), lets use that in our query.
select
a.ID,
b.model
from
cars a
join models b
on a.model=b.ID
where
b.ID=1
+----+--------+
| ID | model |
+----+--------+
| 1 | Sports |
| 3 | Sports |
| 8 | Sports |
| 10 | Sports |
+----+--------+
4 rows in set (0.00 sec)
Bingo! The boss is happy. Of course, being a boss and never being happy with what he asked for, he looks at the information, then says I want the colors as well.
Okay, so we have a good part of our query already written, but we need to use a third table which is colors. Now, our main information table cars
stores the car color ID and this links back to the colors ID column. So, in a similar manner to the original, we can join a third table:
select
a.ID,
b.model
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
where
b.ID=1
+----+--------+
| ID | model |
+----+--------+
| 1 | Sports |
| 3 | Sports |
| 8 | Sports |
| 10 | Sports |
+----+--------+
4 rows in set (0.00 sec)
Damn, although the table was correctly joined and the related columns were linked, we forgot to pull in the actual information from the new table that we just linked.
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
where
b.ID=1
+----+--------+-------+
| ID | model | color |
+----+--------+-------+
| 1 | Sports | Red |
| 8 | Sports | Green |
| 10 | Sports | White |
| 3 | Sports | Black |
+----+--------+-------+
4 rows in set (0.00 sec)
Right, that's the boss off our back for a moment. Now, to explain some of this in a little more detail. As you can see, the from
clause in our statement links our main table (I often use a table that contains information rather than a lookup or dimension table. The query would work just as well with the tables all switched around, but make less sense when we come back to this query to read it in a few months time, so it is often best to try to write a query that will be nice and easy to understand - lay it out intuitively, use nice indenting so that everything is as clear as it can be. If you go on to teach others, try to instill these characteristics in their queries - especially if you will be troubleshooting them.
It is entirely possible to keep linking more and more tables in this manner.
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=1
While I forgot to include a table where we might want to join more than one column in the join
statement, here is an example. If the models
table had brand-specific models and therefore also had a column called brand
which linked back to the brands
table on the ID
field, it could be done as this:
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
and b.brand=d.ID
where
b.ID=1
You can see, the query above not only links the joined tables to the main cars
table, but also specifies joins between the already joined tables. If this wasn't done, the result is called a cartesian join - which is dba speak for bad. A cartesian join is one where rows are returned because the information doesn't tell the database how to limit the results, so the query returns all the rows that fit the criteria.
So, to give an example of a cartesian join, lets run the following query:
select
a.ID,
b.model
from
cars a
join models b
+----+--------+
| ID | model |
+----+--------+
| 1 | Sports |
| 1 | Sedan |
| 1 | 4WD |
| 1 | Luxury |
| 2 | Sports |
| 2 | Sedan |
| 2 | 4WD |
| 2 | Luxury |
| 3 | Sports |
| 3 | Sedan |
| 3 | 4WD |
| 3 | Luxury |
| 4 | Sports |
| 4 | Sedan |
| 4 | 4WD |
| 4 | Luxury |
| 5 | Sports |
| 5 | Sedan |
| 5 | 4WD |
| 5 | Luxury |
| 6 | Sports |
| 6 | Sedan |
| 6 | 4WD |
| 6 | Luxury |
| 7 | Sports |
| 7 | Sedan |
| 7 | 4WD |
| 7 | Luxury |
| 8 | Sports |
| 8 | Sedan |
| 8 | 4WD |
| 8 | Luxury |
| 9 | Sports |
| 9 | Sedan |
| 9 | 4WD |
| 9 | Luxury |
| 10 | Sports |
| 10 | Sedan |
| 10 | 4WD |
| 10 | Luxury |
+----+--------+
40 rows in set (0.00 sec)
Good god, that's ugly. However, as far as the database is concerned, it is exactly what was asked for. In the query, we asked for for the ID
from cars
and the model
from models
. However, because we didn't specify how to join the tables, the database has matched every row from the first table with every row from the second table.
Okay, so the boss is back, and he wants more information again. I want the same list, but also include 4WDs in it.
This however, gives us a great excuse to look at two different ways to accomplish this. We could add another condition to the where clause like this:
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=1
or b.ID=3
While the above will work perfectly well, lets look at it differently, this is a great excuse to show how a union
query will work.
We know that the following will return all the Sports cars:
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=1
And the following would return all the 4WDs:
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=3
So by adding a union all
clause between them, the results of the second query will be appended to the results of the first query.
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=1
union all
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=3
+----+--------+-------+
| ID | model | color |
+----+--------+-------+
| 1 | Sports | Red |
| 8 | Sports | Green |
| 10 | Sports | White |
| 3 | Sports | Black |
| 5 | 4WD | Green |
| 7 | 4WD | White |
| 9 | 4WD | Black |
+----+--------+-------+
7 rows in set (0.00 sec)
As you can see, the results of the first query are returned first, followed by the results of the second query.
In this example, it would of course have been much easier to simply use the first query, but union
queries can be great for specific cases. They are a great way to return specific results from tables from tables that aren't easily joined together - or for that matter completely unrelated tables. There are a few rules to follow however.
Now, you might be wondering what the difference is between using union
and union all
. A union
query will remove duplicates, while a union all
will not. This does mean that there is a small performance hit when using union
over union all
but the results may be worth it - I won't speculate on that sort of thing in this though.
On this note, it might be worth noting some additional notes here.
order by
but you can't use the alias anymore. In the query above, appending an order by a.ID
would result in an error - as far as the results are concerned, the column is called ID
rather than a.ID
- even though the same alias has been used in both queries.order by
statement, and it must be as the last statement.For the next examples, I am adding a few extra rows to our tables.
I have added Holden
to the brands table.
I have also added a row into cars
that has the color
value of 12
- which has no reference in the colors table.
Okay, the boss is back again, barking requests out - *I want a count of each brand we carry and the number of cars in it!` - Typical, we just get to an interesting section of our discussion and the boss wants more work.
Rightyo, so the first thing we need to do is get a complete listing of possible brands.
select
a.brand
from
brands a
+--------+
| brand |
+--------+
| Ford |
| Toyota |
| Nissan |
| Smart |
| BMW |
| Holden |
+--------+
6 rows in set (0.00 sec)
Now, when we join this to our cars table we get the following result:
select
a.brand
from
brands a
join cars b
on a.ID=b.brand
group by
a.brand
+--------+
| brand |
+--------+
| BMW |
| Ford |
| Nissan |
| Smart |
| Toyota |
+--------+
5 rows in set (0.00 sec)
Which is of course a problem - we aren't seeing any mention of the lovely Holden
brand I added.
This is because a join looks for matching rows in both tables. As there is no data in cars that is of type Holden
it isn't returned. This is where we can use an outer
join. This will return all the results from one table whether they are matched in the other table or not:
select
a.brand
from
brands a
left outer join cars b
on a.ID=b.brand
group by
a.brand
+--------+
| brand |
+--------+
| BMW |
| Ford |
| Holden |
| Nissan |
| Smart |
| Toyota |
+--------+
6 rows in set (0.00 sec)
Now that we have that, we can add a lovely aggregate function to get a count and get the boss off our backs for a moment.
select
a.brand,
count(b.id) as countOfBrand
from
brands a
left outer join cars b
on a.ID=b.brand
group by
a.brand
+--------+--------------+
| brand | countOfBrand |
+--------+--------------+
| BMW | 2 |
| Ford | 2 |
| Holden | 0 |
| Nissan | 1 |
| Smart | 1 |
| Toyota | 5 |
+--------+--------------+
6 rows in set (0.00 sec)
And with that, away the boss skulks.
Now, to explain this in some more detail, outer joins can be of the left
or right
type. The Left or Right defines which table is fully included. A left outer join
will include all the rows from the table on the left, while (you guessed it) a right outer join
brings all the results from the table on the right into the results.
Some databases will allow a full outer join
which will bring back results (whether matched or not) from both tables, but this isn't supported in all databases.
Now, I probably figure at this point in time, you are wondering whether or not you can merge join types in a query - and the answer is yes, you absolutely can.
select
b.brand,
c.color,
count(a.id) as countOfBrand
from
cars a
right outer join brands b
on b.ID=a.brand
join colors c
on a.color=c.ID
group by
a.brand,
c.color
+--------+-------+--------------+
| brand | color | countOfBrand |
+--------+-------+--------------+
| Ford | Blue | 1 |
| Ford | White | 1 |
| Toyota | Black | 1 |
| Toyota | Green | 2 |
| Toyota | Red | 1 |
| Nissan | Black | 1 |
| Smart | White | 1 |
| BMW | Blue | 1 |
| BMW | White | 1 |
+--------+-------+--------------+
9 rows in set (0.00 sec)
So, why is that not the results that were expected? It is because although we have selected the outer join from cars to brands, it wasn't specified in the join to colors - so that particular join will only bring back results that match in both tables.
Here is the query that would work to get the results that we expected:
select
a.brand,
c.color,
count(b.id) as countOfBrand
from
brands a
left outer join cars b
on a.ID=b.brand
left outer join colors c
on b.color=c.ID
group by
a.brand,
c.color
+--------+-------+--------------+
| brand | color | countOfBrand |
+--------+-------+--------------+
| BMW | Blue | 1 |
| BMW | White | 1 |
| Ford | Blue | 1 |
| Ford | White | 1 |
| Holden | NULL | 0 |
| Nissan | Black | 1 |
| Smart | White | 1 |
| Toyota | NULL | 1 |
| Toyota | Black | 1 |
| Toyota | Green | 2 |
| Toyota | Red | 1 |
+--------+-------+--------------+
11 rows in set (0.00 sec)
As we can see, we have two outer joins in the query and the results are coming through as expected.
Now, how about those other types of joins you ask? What about Intersections?
Well, not all databases support the intersection
but pretty much all databases will allow you to create an intersection through a join (or a well structured where statement at the least).
An Intersection is a type of join somewhat similar to a union
as described above - but the difference is that it only returns rows of data that are identical (and I do mean identical) between the various individual queries joined by the union. Only rows that are identical in every regard will be returned.
A simple example would be as such:
select
*
from
colors
where
ID>2
intersect
select
*
from
colors
where
id<4
While a normal union
query would return all the rows of the table (the first query returning anything over ID>2
and the second anything having ID<4
) which would result in a full set, an intersect query would only return the row matching id=3
as it meets both criteria.
Now, if your database doesn't support an intersect
query, the above can be easily accomlished with the following query:
select
a.ID,
a.color,
a.paint
from
colors a
join colors b
on a.ID=b.ID
where
a.ID>2
and b.ID<4
+----+-------+----------+
| ID | color | paint |
+----+-------+----------+
| 3 | Blue | Metallic |
+----+-------+----------+
1 row in set (0.00 sec)
If you wish to perform an intersection across two different tables using a database that doesn't inherently support an intersection query, you will need to create a join on every column of the tables.