I am reading about NATURAL shorthand form for SQL joins and I see some traps:
One thing that completely destroys NATURAL
for me is that most of my tables have an id
column, which are obviously semantically all different. You could argue that having a user_id
makes more sense than id
, but then you end up writing things like user.user_id
, a violation of DRY. Also, by the same logic, you would also have columns like user_first_name
, user_last_name
, user_age
... (which also kind of makes sense in view that it would be different from, for example, session_age
)... The horror.
I'll stick to my JOIN ... ON ...
, thankyouverymuch. :)
NATURAL JOIN
syntax is anti-pattern:
Because of this, I don't recommend the syntax in any environment.
I also don't recommend mixing syntax (IE: using both NATURAL JOIN
and explicit INNER/OUTER JOIN syntax) - keep a consistent codebase format.
I agree with the other posters that an explicit join should be used for reasons of clarity and also to easily allow a switch to an "OUTER" join should your requirements change.
However most of your "traps" have nothing to do with joins but rather the evils of using "SELECT *" instead of explicitly naming the columns you require "SELECT a.col1, a.col2, b.col1, b.col2". These traps occurs whenever a wildcard column list is used.
Adding an extra reason not listed in any of the answers above. In postgres (not sure if this the case for other databases) if no column names are found in common between the two tables when using NATURAL JOIN
then a CROSS JOIN
is performed. This means that if you had an existing query and then you were to subsequently change one of the column names in a table, you would still get a set of rows returned from the query rather than an error. If instead you used the JOIN ... USING(...)
syntax you would get an error if the joining column was no longer there.
The postgres documentation has a note to this effect:
Note: USING is reasonably safe from column changes in the joined relations since only the listed columns are combined. NATURAL is considerably more risky since any schema changes to either relation that cause a new matching column name to be present will cause the join to combine that new column as well.
These "traps", which seem to argue against natural joins, cut both ways. Suppose you add a new column to table A, fully expecting it to be used in joining with table B. If you know that every join of A and B is a natural join, then you're done. If every join explicitly uses USING, then you have to track them all down and change them. Miss one and there's a bug.
Use NATURAL joins when the semantics of the tables suggests that this is the right thing to do. Use explicit join criteria when you want to make sure the join is done in a specific way, regardless of how the table definitions might evolve.
Do you mean the syntax like this:
SELECT *
FROM t1, t2, t3 ON t1.id = t2.id
AND t2.id = t3.id
Versus this:
SELECT *
FROM t1
LEFT OUTER JOIN t2 ON t1.id = t2.id
AND t2.id = t3.id
I prefer the 2nd syntax and also format it differently:
SELECT *
FROM T1
LEFT OUTER JOIN T2 ON T2.id = T1.id
LEFT OUTER JOIN T3 ON T3.id = T2.id
In this case, it is very clear what tables I am joining and what ON clause I am using to join them. By using that first syntax is just too easy to not put in the proper JOIN and get a huge result set. I do this because I am prone to typos, and this is my insurance against that. Plus, it is visually easier to debug.