问题
I've got a large query where a simple subquery optimization dropped it from 8 minutes down to 20 seconds. I'm not sure I understand why the optimization had such a drastic effect.
In essence, here's the problem part:
SELECT (bunch of stuff)
FROM
a LEFT OUTER JOIN b ON a.ID = b.a
LEFT OUTER JOIN c ON b.ID = c.b
...
...
INNER JOIN veryLargeTable
ON a.ID = veryLargeTable.a
AND veryLargeTable.PetID =
(SELECT id from Pets WHERE Pets.Name = 'Something') /* BAD! */
...
...
In all, there are 16 joined tables. If I replace the second predicate of the veryLargeTable
join with a pre-populated variable containing the petID (instead of using the subquery) the entire query speeds up dramatically:
AND veryLargeTable.PetID = @petID /* Awesome! */
Obviously, (SELECT id from Pets WHERE Name = 'Something')
is being executed for every row. There are two things I don't fully understand:
As far as I can tell, this is a non-correlated subquery. The Pets table is not part of the outer query at all. Aren't non-correlated subqueries independently evaluated (and hence optimized)? Why isn't this the case here?
The execution plans are dramatically different. In the failure case (above), the entire subtree deals with an estimated 950k rows. In the win case (using a variable instead of a subquery), there's only about 125k estimated rows. What's going on? Why are so many more rows involved if that subquery is there? The Pets.Name column definitely has unique data (but no unique constraint as far as I can tell).
Note that moving the predicate to the WHERE clause doesn't affect the query in either case, as I would expect, since it's an INNER JOIN.
Insights appreciated!
回答1:
It has been my experience that, the more complex your queries get, the less able the SQL optimizer is to create deft plans. Here you've got 16 joins, some or most are outer joins, you've got at least one subquery... toss in enough indexes, cardinalities, views, outer applies, and who knows what else and no one, not even Microsoft engineers*, can figure out routines that will uniformly and regularly generate The most optimal plans.
What you've described, I've experienced numerous times -- change one simple thing in a messy query and everything's an order of magnitude faster (or, gnashes teeth, slower). I have no method for determining when complex is too complex, it's more a feeling than anything else. My general rule of thumb is, if it looks too long or too complex, simplify where you can--such as your pre-selected single nested value, or breaking out part of the query than will always run fast with a small result set, and running it first and storing the results in a temp table.
( * Please note that this is mild sarcsam)
回答2:
As an alternative, I think you could eliminate the sub-query with:
...
INNER JOIN veryLargeTable vLT
ON a.ID = vLT.a
INNER JOIN Pets p
ON vLT.PetID = p.id
and p.Name = 'Something'
...
回答3:
I personally think the outcome is not surprising if there's no index on Pets.Name. If you create a unique index on Pets.Name you are likely to see better results. Without index from the server's point of view the subquery may return multiple rows or NULL. Perhaps optimiser could do better; it often needs help.
回答4:
The reason is as you pointed out, and from my experience, that often even the simplest non-correlated subqueries are often recomputed by SQL Server's query optimizer.
For example, you can look at the execution plan of the following query and see that the non-correlated subquery is recomputed.
SELECT ID
FROM #table1
WHERE ID in (SELECT ID from #table1)
UNION ALL
SELECT ID
FROM #table1
WHERE ID in (SELECT ID from #table1)
This is with or without a clustered index on the attribute, "ID" in this case. As someone pointed out, you could rewrite this query to use a join instead of the subquery. In many cases that can be done, however, if the subquery returns instead an aggregate scalar e.g.
where ID = (select MAX(ID) from #table1)
then a join rewrite might not work so easily.
来源:https://stackoverflow.com/questions/3577658/why-is-this-non-correlated-subquery-causing-such-problems