I was reading this page about APPLY:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/07/07/using-cross-apply-to-optimize-joins-on-between-conditions.aspx
There is no definitive "before" and "after" on these queries. RDBMS is allowed to decide when to run what part of the query, as long as the results the query produces do not change.
In the first case, there is nothing the query can do to pre-filter the rows of Commercials
, because the WHERE
clause constrains only the rows of the Calls
. These constraints specified a range for c.AirTime
in terms of the corresponding row of Commercials
, so no pre-filtering is possible: all rows of Calls
would be considered for each row of Commercials
.
In the second case, however, RDBMS can improve on the time by observing that you additionally constraint the range for c.AirTime
to between 23:45 on Jun-30, 2008 through midnight of Jul-1, 2008 by constraining s.StartedAt
to which c.AirTime
is joined. This can allow the optimizer use an index, if one is defined on the Calls.AirTime
column.
The important observation here is that the RDBMS can do very clever things when optimizing your query. It arrives at the optimized strategy by applying multiple rules of logic, trying to push the constraints closer to the "source of rows" in a join. The best option to checking what the optimizer does is reading the query plan.
A massive amount of logic, time, blood, sweat, and tears have gone into the SQL Server Engine Optimizer, which is what determines the query plan that determines how a statement is actually processed. What is written in a statement in no way reflects what actually executes in the engine.
To really see what's going on, run your queries with the show actual query plan option enabled. My guess is that based on the additional where clause the data is being pre-filtered by the optimizer.
The second query is faster why you are limiting the scope of the join.
First query: A join B
Second query: A join subset(B)
As subset(B) < B itself there are a lot less matches to scan for.
And that leads to the question: the column used in that join got a index? (Probably not or the speeds cannot differ a lot)
They are not the same queries so why would you expect the same response times
If the two queries are returning a different number of rows then use a top X for a more fair comparison
Query optimizer can get very smart (and it can get stupid)
View the query plan to see what is going on
My experience is the query optimize has a better chance of getting smart if you pull the conditions into the join
SELECT s.StartedAt, s.EndedAt, c.AirTime
FROM dbo.Commercials s
JOIN dbo.Calls c
ON c.AirTime >= s.StartedAt
AND c.AirTime < s.EndedAt
AND c.AirTime BETWEEN '20080701' AND '20080701 03:00'
AND s.StartedAt BETWEEN '20080630 23:45' AND '20080701 03:00'
If you just have a single join then the query optimizer may move a where early
But if you have multiple joins I have never seen the query optimizer move a where early