Number of times a nested query is executed

你说的曾经没有我的故事 提交于 2019-12-02 04:49:17

As others point out, this correlated subquery can be rewritten as a join, but that is not entirely the complete story because the execution plan for an untransformed EXISTS is going to look pretty much like a join anyway. So this is not really a syntax question but a query optimisation issue.

EXISTS is really just syntactic shorthand for "join to this data set but only to a single row in it even if there are 1,000,000 matches", or what is also known as a semijoin.

So the semijoin required by an EXISTS predicate against a correlated or uncorrelated subquery can be implemented in a number of ways, which depend to a large extent on the numbers or records in the two tables.

If you imagine that CUSTOMER is estimated to have a single row, and the optimiser estimates that there are many thousands of rows in RENTALS for which PICKUP = 'CARY', then the optimiser's is pretty likely to read the row from the CUSTOMER TABLE and perform a single lookup against the RENTALS table.

If there are an estimated one million CUSTOMERS and only one row in the RENTALS table then that execution plan would be crazy -- the optimiser could instead invert the join by leading with the RENTALS table and looking up against the CUSTOMER table the single row to be returned. In that case the subquery has arguably not been executed at all.

In between these extremes there are various other optimisation. For example, building the unique values of the RENTAL.CID column into an in-memory hash table for rows where PICKUP='CARY' and full-scanning the CUSTOMER TABLE to probe that hash table for every row, which would be a HASH SEMIJOIN. Again, no execution of a recognisable subquery. (And the query rewrite that Barmer suggests is likely to lead to that kind of plan, but might also constrain the optimiser from seeing other plans suitable for other data distributions and cardinalities).

So, as other answers say, the question is really moot, because I think that there are two important lessons here:

  1. Many different SQL statements can lead to the same execution plan, and a single SQL statement can also lead to multiple execution plans.
  2. You should write queries that express syntactically the result that you want, and not in general to preempt and constrain the query optimiser's choices.

The second point is important, as it argues against some developer's instinct to avoid writing correlated subqueries (and EXISTS or NOT EXISTS in particular it seems) in the hope of providing their own optimisation. In particular, replacing an EXISTS with an outer join can be very suboptimal given the right/wrong data distributions.

To get to the direct answer to the question, I'd say either:

  • 0
  • 1
  • However many rows are in CUSTOMERS
  • Possibly something else.

There's no correct theoretical answer to this question. A smart query optimizer can transform the query into a JOIN:

SELECT CID, CNAME
FROM Customer
LEFT JOIN (SELECT DISTINCT CID FROM Rentals WHERE PICKUP = 'CARY') as Rentals
ON Customer.CID = Rentals.CID
WHERE Rentals.CID IS NOT NULL

Now the sub-query is just executed once.

The query won't be executed that way. The query planner will turn that into something where it will run one query against the result of the other, not running the subquery over and over.

Most likely the query plan for your query will look the same as for a query where you make a left join and check for existing records:

select c.CID, c.CNAME
from CUSTOMER c
left join RENTALS r on c.CID = r.CID and r.PICKUP = 'CARY'
where r.CID is not null
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!