Is there any difference (performance, best-practice, etc...) between putting a condition in the JOIN clause vs. the WHERE clause?
For example...
-- C
Most RDBMS products will optimize both queries identically. In "SQL Performance Tuning" by Peter Gulutzan and Trudy Pelzer, they tested multiple brands of RDBMS and found no performance difference.
I prefer to keep join conditions separate from query restriction conditions.
If you're using OUTER JOIN
sometimes it's necessary to put conditions in the join clause.
Putting the condition in the join seems "semantically wrong" to me, as that's not what JOINs are "for". But that's very qualitative.
Additional problem: if you decide to switch from an inner join to, say, a right join, having the condition be inside the JOIN could lead to unexpected results.
The relational algebra allows interchangeability of the predicates in the WHERE
clause and the INNER JOIN
, so even INNER JOIN
queries with WHERE
clauses can have the predicates rearrranged by the optimizer so that they may already be excluded during the JOIN
process.
I recommend you write the queries in the most readable way possible.
Sometimes this includes making the INNER JOIN
relatively "incomplete" and putting some of the criteria in the WHERE
simply to make the lists of filtering criteria more easily maintainable.
For example, instead of:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
AND c.State = 'NY'
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
AND a.Status = 1
Write:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
AND a.Status = 1
But it depends, of course.
I prefer the JOIN to join full tables/Views and then use the WHERE To introduce the predicate of the resulting set.
It feels syntactically cleaner.
It is better to add the condition in the Join. Performance is more important than readability. For large datasets, it matters.
I typically see performance increases when filtering on the join. Especially if you can join on indexed columns for both tables. You should be able to cut down on logical reads with most queries doing this too, which is, in a high volume environment, a much better performance indicator than execution time.
I'm always mildly amused when someone shows their SQL benchmarking and they've executed both versions of a sproc 50,000 times at midnight on the dev server and compare the average times.