Whether Inner Queries Are Okay?

问题

I often see something like...

SELECT events.id, events.begin_on, events.name
  FROM events
 WHERE events.user_id IN ( SELECT contacts.user_id 
                             FROM contacts 
                            WHERE contacts.contact_id = '1')
   OR events.user_id IN ( SELECT contacts.contact_id 
                            FROM contacts 
                           WHERE contacts.user_id = '1')

Is it okay to have query in query? Is it "inner query"? "Sub-query"? Does it counts as three queries (my example)? If its bad to do so... how can I rewrite my example?

回答1:

Your example isn't too bad. The biggest problems usually come from cases where there is what's called a "correlated subquery". That's when the subquery is dependent on a column from the outer query. These are particularly bad because the subquery effectively needs to be rerun for every row in the potential results.

You can rewrite your subqueries using joins and GROUP BY, but as you have it performance can vary, especially depending on your RDBMS.

回答2:

It varies from database to database, especially if the columns compared are

indexed or not
nullable or not

..., but generally if your query is not using columns from the table joined to -- you should be using either IN or EXISTS:

SELECT e.id, e.begin_on, e.name
  FROM EVENTS e
 WHERE EXISTS (SELECT NULL
                 FROM CONTACTS c 
                WHERE ( c.contact_id = '1' AND c.user_id = e.user_id )
                   OR ( c.user_id = '1' AND c.contact_id = e.user_id )

Using a JOIN (INNER or OUTER) can inflate records if the child table has more than one record related to a parent table record. That's fine if you need that information, but if not then you need to use either GROUP BY or DISTINCT to get a result set of unique values -- and that can cost you when you review the query costs.

EXISTS

Though EXISTS clauses look like correlated subqueries, they do not execute as such (RBAR: Row By Agonizing Row). EXISTS returns a boolean based on the criteria provided, and exits on the first instance that is true -- this can make it faster than IN when dealing with duplicates in a child table.

回答3:

You could JOIN to the Contacts table instead:

SELECT events.id, events.begin_on, events.name
FROM events
JOIN contacts
ON (events.user_id = contacts.contact_id OR events.user_id = contacts.user_id)
WHERE events.user_id = '1'
GROUP BY events.id  
-- exercise: without the GROUP BY, how many duplicate rows can you end up with?

This leaves the following question up to the database: "Should we look through all the contacts table and find all the '1's in the various columns, or do something else?" where your original SQL didn't give it much choice.

回答4:

The most common term for this sort of query is "subquery." There is nothing inherently wrong in using them, and can make your life easier. However, performance can often be improved by rewriting queries w/ subqueries to use JOINs instead, because the server can find optimizations.

In your example, three queries are executed: the main SELECT query, and the two SELECT subqueries.

SELECT events.id, events.begin_on, events.name
FROM events
JOIN contacts
ON (events.user_id = contacts.contact_id OR events.user_id = contacts.user_id)
WHERE events.user_id = '1'
GROUP BY events.id

In your case, I believe the JOIN version will be better as you can avoid two SELECT queries on contacts, opting for the JOIN instead.

See the mysql docs on the topic.

来源：https://stackoverflow.com/questions/6586079/whether-inner-queries-are-okay

标签

sql

subquery