Join vs. sub-query

前端未结

关注

 19  2021

广开言路 2020-11-21 05:05

I am an old-school MySQL user and have always preferred JOIN over sub-query. But nowadays everyone uses sub-query, and I hate it; I don\'t know why.

19条回答

-上瘾入骨i (楼主)

2020-11-21 05:21
In the year 2010 I would have joined the author of this questions and would have strongly voted for JOIN, but with much more experience (especially in MySQL) I can state: Yes subqueries can be better. I've read multiple answers here; some stated subqueries are faster, but it lacked a good explanation. I hope I can provide one with this (very) late answer:

First of all, let me say the most important: There are different forms of sub-queries

And the second important statement: Size matters

If you use sub-queries, you should be aware of how the DB-Server executes the sub-query. Especially if the sub-query is evaluated once or for every row! On the other side, a modern DB-Server is able to optimize a lot. In some cases a subquery helps optimizing a query, but a newer version of the DB-Server might make the optimization obsolete.

Sub-queries in Select-Fields
```
SELECT moo, (SELECT roger FROM wilco WHERE moo = me) AS bar FROM foo
```
Be aware that a sub-query is executed for every resulting row from foo.
Avoid this if possible; it may drastically slow down your query on huge datasets. However, if the sub-query has no reference to foo it can be optimized by the DB-server as static content and could be evaluated only once.

Sub-queries in the Where-statement
```
SELECT moo FROM foo WHERE bar = (SELECT roger FROM wilco WHERE moo = me)
```
If you are lucky, the DB optimizes this internally into a JOIN. If not, your query will become very, very slow on huge datasets because it will execute the sub-query for every row in foo, not just the results like in the select-type.

Sub-queries in the Join-statement
```
SELECT moo, bar 
  FROM foo 
    LEFT JOIN (
      SELECT MIN(bar), me FROM wilco GROUP BY me
    ) ON moo = me
```
This is interesting. We combine JOIN with a sub-query. And here we get the real strength of sub-queries. Imagine a dataset with millions of rows in wilco but only a few distinct me. Instead of joining against a huge table, we have now a smaller temporary table to join against. This can result in much faster queries depending on database size. You can have the same effect with CREATE TEMPORARY TABLE ... and INSERT INTO ... SELECT ..., which might provide better readability on very complex queries (but can lock datasets in a repeatable read isolation level).

Nested sub-queries
```
SELECT moo, bar
  FROM (
    SELECT moo, CONCAT(roger, wilco) AS bar
      FROM foo
      GROUP BY moo
      HAVING bar LIKE 'SpaceQ%'
  ) AS temp_foo
  ORDER BY bar
```
You can nest sub-queries in multiple levels. This can help on huge datasets if you have to group or sort the results. Usually the DB-Server creates a temporary table for this, but sometimes you do not need sorting on the whole table, only on the resultset. This might provide much better performance depending on the size of the table.

Conclusion

Sub-queries are no replacement for a JOIN and you should not use them like this (although possible). In my humble opinion, the correct use of a sub-query is the use as a quick replacement of CREATE TEMPORARY TABLE .... A good sub-query reduces a dataset in a way you cannot accomplish in an ON statement of a JOIN. If a sub-query has one of the keywords GROUP BY or DISTINCT and is preferably not situated in the select fields or the where statement, then it might improve performance a lot.
0 讨论(0)

查看其它19个回答
发布评论:

提交评论
- 加载中...

Join vs. sub-query

Sub-queries in Select-Fields

Sub-queries in the Where-statement

Sub-queries in the Join-statement

Nested sub-queries

Conclusion