So I have a query like this:
SELECT tablea.name, tablea.views from tablea inner
join tableb on (tablea.id = tableb.id and tablea.balance > 0)
order by tablea.views asc limit 1
However, the problem is that when I run it, it runs quite slow (4+ seconds). Interestingly, when the 'order by' clause is removed, while keeping the limit 1, it runs in 0.005 seconds (approx).
Even more interestingly: when I don't join it to tableb, i.e.:
SELECT tablea.name, tablea.views from tablea
where tablea.balance > 0
order by tablea.views asc limit 1
The query runs in 0.005 seconds usually.
Notes:
- The column views in tablea is indexed
- tablea and tableb have a 1 to 1 relationship in terms of id, and roughly have the same amount of rows.
Why is there such a drastic difference in performance between the first query, the first query when 'order by' is removed, and the second query?
Would there anyway to make the sorting much faster when joining two tables?
One possible explanation as to what is going on here is that MySQL is choosing to do the ordering before it does the actual join. As you saw in your original query when removing the ORDER BY
clause, the joining by itself is not a performance problem. One way to get around this would be to wrap your original query in a subquery, and then order it:
SELECT *
FROM
(
SELECT tablea.name,
tablea.views
FROM tablea
INNER JOIN tableb
ON tablea.id = tableb.id AND
tablea.balance > 0
) t
ORDER BY t.views ASC
LIMIT 1
If this works, then it probably confirms what I speculated. In this case, the subquery forces MySQL to only order records which result from the actual subquery. In any case, you should get in the habit of running EXPLAIN
on such queries. My guess is that the index isn't being used/effective when joining in your original query.
Reference: Slow query when using ORDER BY
Given INDEX(x)
ORDER BY x LIMIT 1
Will conveniently use the index and pick off the first item
Given INDEX(x)
WHERE ...
ORDER BY x LIMIT 1
May also use the index, and hope that some early row is satisfied by the WHERE
. If not, then it could have to scan the entire table to find the one row !
Given INDEX(a, x)
WHERE a = 12
ORDER BY x LIMIT 1
No problem -- Look in the index for a=12; pick the first item.
Given INDEX(a, x)
WHERE a > 12
ORDER BY x LIMIT 1
Now the index is not so good. It will need to pick up all the rows with a>12, sort by x, then deliver one row.
In general, if the WHERE
, and ORDER BY
can be completely satisfied, then LIMIT n
can be optimized. (This assumes no GROUP BY
, or the GROUP BY
and ORDER BY
are identical.)
That's with one table. When you JOIN
two (or more) tables, it gets messier. With two tables, the Optimizer chooses one table, finds what it can there, then does a Nested Loop Join to the other table.
Usually (not always), a WHERE
clause (on one table) tells the Optimizer "pick me". If that's the same table as the ORDER BY
, then the above discussion may kick in.
Without a WHERE
clause, the Optimizer usually starts with the smaller table. (Note: The table size is based on row estimates and may not be correct every time.)
Your first query might be sped up by using WHERE EXISTS ( ... tableb ... )
instead of the JOIN tableb...
. The Optimizer would see that as something worth optimizing.
Etc, etc, etc.
Note that your "0.005 seconds" was "luck".
If you want to dig deeper, provide SHOW CREATE TABLE
(so we can see the index(es), etc), EXPLAIN SELECT
(so we can discuss what the Optimizer decided on) and, if possible EXPLAIN FORMAT=JSON SELECT ...
for more details. Also see my indexing cookbook .
来源:https://stackoverflow.com/questions/41919504/how-to-optimize-mysql-order-by-limit-1-in-queries-that-join-multiple-tables