How to optimize MySQL “Order By Limit 1” in queries that join multiple tables?

心不动则不痛 提交于 2019-12-10 11:32:34

问题


So I have a query like this:

SELECT tablea.name, tablea.views from tablea inner 
join tableb on (tablea.id = tableb.id and tablea.balance > 0) 
order by tablea.views asc limit 1

However, the problem is that when I run it, it runs quite slow (4+ seconds). Interestingly, when the 'order by' clause is removed, while keeping the limit 1, it runs in 0.005 seconds (approx).

Even more interestingly: when I don't join it to tableb, i.e.:

SELECT tablea.name, tablea.views from tablea 
where tablea.balance > 0 
order by tablea.views asc limit 1

The query runs in 0.005 seconds usually.

Notes:

  • The column views in tablea is indexed
  • tablea and tableb have a 1 to 1 relationship in terms of id, and roughly have the same amount of rows.

Why is there such a drastic difference in performance between the first query, the first query when 'order by' is removed, and the second query?

Would there anyway to make the sorting much faster when joining two tables?


回答1:


One possible explanation as to what is going on here is that MySQL is choosing to do the ordering before it does the actual join. As you saw in your original query when removing the ORDER BY clause, the joining by itself is not a performance problem. One way to get around this would be to wrap your original query in a subquery, and then order it:

SELECT *
FROM
(
    SELECT tablea.name,
           tablea.views
    FROM tablea
    INNER JOIN tableb
        ON tablea.id = tableb.id AND
           tablea.balance > 0
) t
ORDER BY t.views ASC
LIMIT 1

If this works, then it probably confirms what I speculated. In this case, the subquery forces MySQL to only order records which result from the actual subquery. In any case, you should get in the habit of running EXPLAIN on such queries. My guess is that the index isn't being used/effective when joining in your original query.

Reference: Slow query when using ORDER BY




回答2:


Given INDEX(x)
ORDER BY x LIMIT 1

Will conveniently use the index and pick off the first item

Given INDEX(x)
WHERE ...
ORDER BY x LIMIT 1

May also use the index, and hope that some early row is satisfied by the WHERE. If not, then it could have to scan the entire table to find the one row !

Given INDEX(a, x)
WHERE a = 12
ORDER BY x LIMIT 1

No problem -- Look in the index for a=12; pick the first item.

Given INDEX(a, x)
WHERE a > 12
ORDER BY x LIMIT 1

Now the index is not so good. It will need to pick up all the rows with a>12, sort by x, then deliver one row.

In general, if the WHERE, and ORDER BY can be completely satisfied, then LIMIT n can be optimized. (This assumes no GROUP BY, or the GROUP BY and ORDER BY are identical.)

That's with one table. When you JOIN two (or more) tables, it gets messier. With two tables, the Optimizer chooses one table, finds what it can there, then does a Nested Loop Join to the other table.

Usually (not always), a WHERE clause (on one table) tells the Optimizer "pick me". If that's the same table as the ORDER BY, then the above discussion may kick in.

Without a WHERE clause, the Optimizer usually starts with the smaller table. (Note: The table size is based on row estimates and may not be correct every time.)

Your first query might be sped up by using WHERE EXISTS ( ... tableb ... ) instead of the JOIN tableb.... The Optimizer would see that as something worth optimizing.

Etc, etc, etc.

Note that your "0.005 seconds" was "luck".

If you want to dig deeper, provide SHOW CREATE TABLE (so we can see the index(es), etc), EXPLAIN SELECT (so we can discuss what the Optimizer decided on) and, if possible EXPLAIN FORMAT=JSON SELECT ... for more details. Also see my indexing cookbook .



来源:https://stackoverflow.com/questions/41919504/how-to-optimize-mysql-order-by-limit-1-in-queries-that-join-multiple-tables

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!