问题
I have an SQL question, related to this and this question (but different). Basically I want to know how I can avoid a nested query.
Let's say I have a huge table of jobs (jobs
) executed by a company in their history. These jobs are characterized by year, month, location and the code belonging to the tool used for the job. Additionally I have a table of tools (tools
), translating tool codes to tool descriptions and further data about the tool. Now they want a website where they can select year, month, location and tool using a dropdown box, after which the matching jobs will be displayed. I want to fill the last dropdown with only the relevant tools matching the before selection of year, month and location, so I write the following nested query:
SELECT c.tool_code, t.tool_description
FROM (
SELECT DISTINCT j.tool_code
FROM jobs AS j
WHERE j.year = ....
AND j.month = ....
AND j.location = ....
) AS c
LEFT JOIN tools as t
ON c.tool_code = t.tool_code
ORDER BY c.tool_code ASC
I resorted to this nested query because it was much faster than performing a JOIN on the complete database and selecting from that. It got my query time down a lot. But as I have recently read that MySQL nested queries should be avoided at all cost, I am wondering whether I am wrong in this approach. Should I rewrite my query differently? And how?
回答1:
No, you shouldn't, your query is fine.
Just create an index on jobs (year, month, location, tool_code)
and tools (tool_code)
so that the INDEX FOR GROUP-BY
can be used.
The article your provided describes the subquery predicates (IN (SELECT ...)
), not the nested queries (SELECT FROM (SELECT ...)
).
Even with the subqueries, the article is wrong: while MySQL
is not able to optimize all subqueries, it deals with IN (SELECT …)
predicates just fine.
I don't know why the author chose to put DISTINCT
here:
SELECT id, name, price
FROM widgets
WHERE id IN
(
SELECT DISTINCT widgetId
FROM widgetOrders
)
and why do they think this will help to improve performance, but given that widgetID
is indexed, MySQL
will just transform this query:
SELECT id, name, price
FROM widgets
WHERE id IN
(
SELECT widgetId
FROM widgetOrders
)
into an index_subquery
Essentially, this is just like EXISTS
clause: the inner subquery will be executed once per widgets
row with the additional predicate added:
SELECT NULL
FROM widgetOrders
WHERE widgetId = widgets.id
and stop on the first match in widgetOrders
.
This query:
SELECT DISTINCT w.id,w.name,w.price
FROM widgets w
INNER JOIN
widgetOrders o
ON w.id = o.widgetId
will have to use temporary
to get rid of the duplicates and will be much slower.
回答2:
You could avoid the subquery by using GROUP BY
, but if the subquery performs better, keep it.
Why do you use a LEFT JOIN
instead of a JOIN
to join tools
?
来源:https://stackoverflow.com/questions/2132905/how-to-avoid-nested-sql-query-in-this-case