问题
I have a large query in mysql that involves joining multiple tables together. It's too slow, so I've done "explain" and see that it's creating a temporary table, which I suspect of taking most of the execution time. I found some related information:
- The mysql docs describe conditions when a temporary table might be created. ("The server creates temporary tables under conditions such as these..." [Emphasis added])
- This related SO question Using index, using temporary, using filesort - how to fix this?, which provides a link to the doc and applies it in a specific case.
- This related SO question Order of join conditions important? that talks about the order in which joins are evaluated.
My query does not appear to meet any of the conditions listed in the docs #1, in the order that the joins were written by me. By experimentation, however, I find that if I remove my order by
clause, the temporary table is not created. That makes me look at this rule from the doc:
Evaluation of statements that contain an ORDER BY clause and a different GROUP BY clause, or for which the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue.
This is the same rule that played in the example at #2 above, but in #2, the OP explicitly had columns from multiple tables in the order by
clause, so that's at least superficially different.
Moreover, when I look at the output from explain
, it appears that the table that I listed first is not used first by the optimizer. Putting down a pseudo-query for example:
select * from A
join B on A.c1=B.c1
join C on A.c2=C.c2
where A.c3='value'
order by A.c4
I would say that my order by
clause does use only columns from the "first table in the join queue" based on the order that I've written the query. On the other hand, the output from explain
suggests that it first considers table B then A.
So here are the questions:
- Does the quoted rule above for use of temporary tables refer to the order that I write the tables or the order that the software chooses to evaluate them?
- If it's the order that I write them, does this mean that the order of the joins does impact performance? (Seems to contradict the claims at #3 above.)
- If it's the order that the software chooses to evaluate them, is there any way to coerce or trick it into selecting and order that doesn't use the table?
回答1:
It refers to the order in which the optimiser evaluates them (join queue). The optimiser may not even be aware of the order of the tables in your sql statement.
No, it does not contradict what's been written in #3, since the answer explicitly writes (emphasis is mine):
has no effect on the result
The result and performance are two different things. Actually, there is an upvoted comment to the answer saying that
But it might affect the query plan (=> performance)
You can tell the optimiser which table to process first by using straight_join:
STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer puts the tables in the wrong order.
However, you need to be careful with that because you tie the optimiser's hand. See this SO topic on discussing advantages and disadvantages of straight_join.
Number of records, where criteria, indexes - they all play their part in the optimiser's decision of the processing order of tables. There is no magic bullet, you need to play around a bit and probably you can trick the optimiser to change the order of the tables.
回答2:
select * from A
join B on A.c1=B.c1
join C on A.c2=C.c2
where A.c3='value'
order by A.c4
The optimizer will use various heuristics to decide which order to look at the tables. In this case, it will start with
A
because of the filters (WHERE...
).This "composite" index on
A
should eliminate the tmp&filesort forORDER BY
:INDEX(c3, c4)
. No that is not the same asINDEX(c3), INDEX(c4)
.After getting the rows from
A
, eitherB
orC
could be reached into ("Nested Loop Join"). These indexes are important:B
:(c1)
andC
:(c2)
.STRAIGHT_JOIN
andFORCE INDEX
are usually a bad idea, and should be used only as a last resort. It may help today's query, but hurt tomorrow.EXPLAIN FORMAT=JSON SELECT ...
gives more information, sometimes even pointing out that two tmp tables are needed.
More tips: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
来源:https://stackoverflow.com/questions/41728097/clarification-of-join-order-for-creation-of-temporary-tables