问题
Here's a neat one for you (MySQL, obviously):
# Setting things up DROP DATABASE IF EXISTS index_test_gutza; CREATE DATABASE index_test_gutza; USE index_test_gutza; CREATE TABLE customer_order ( id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT, invoice MEDIUMINT UNSIGNED NOT NULL DEFAULT 0, PRIMARY KEY(id) ); INSERT INTO customer_order (id, invoice) VALUES (1, 1), (2, 2), (3, 3), (4, 4), (5, 5); CREATE TABLE customer_invoice ( id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT, invoice_no MEDIUMINT UNSIGNED DEFAULT NULL, invoice_pdf LONGBLOB, PRIMARY KEY(id) ); INSERT INTO customer_invoice (id, invoice_no) VALUES (1, 1), (2, 2), (3, 3), (4, 4), (5, 5); # Ok, here's the beef EXPLAIN SELECT co.id FROM customer_order AS co; EXPLAIN SELECT co.id FROM customer_order AS co ORDER BY co.id; EXPLAIN SELECT co.id, ci.invoice_no FROM customer_order AS co LEFT JOIN customer_invoice AS ci ON ci.id=co.invoice; EXPLAIN SELECT co.id, ci.invoice_no FROM customer_order AS co LEFT JOIN customer_invoice AS ci ON ci.id=co.invoice ORDER BY co.id;
There are four EXPLAIN statements at the bottom. The first two result in exactly what you'd expect:
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ | 1 | SIMPLE | co | index | NULL | PRIMARY | 3 | NULL | 5 | Using index | +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
The third one is already interesting -- notice how the primary key in customer_order is not used any more:
+----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+-------------+ | 1 | SIMPLE | co | ALL | NULL | NULL | NULL | NULL | 5 | | | 1 | SIMPLE | ci | eq_ref | PRIMARY | PRIMARY | 3 | index_test_gutza.co.invoice | 1 | Using index | +----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+-------------+
The fourth one, however, is the zinger -- simply adding the ORDER BY on the primary key leads to filesort on customer_order (which is to be expected, given that it was already bewildered above):
+----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+----------------+ | 1 | SIMPLE | co | ALL | NULL | NULL | NULL | NULL | 5 | Using filesort | | 1 | SIMPLE | ci | eq_ref | PRIMARY | PRIMARY | 3 | index_test_gutza.co.invoice | 1 | Using index | +----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+----------------+
Filesort! And that while I'm never using anything except the primary key in the customer_order table for ordering, and the primary key in the customer_invoice table for the JOIN. So then, in the name of all that is good and right, why does it switch to filesort all of a sudden?! And more importantly, how do I avoid this? For the record, I will gladly accept a documented answer explaining why this cannot be avoided (if that's the case.)
As you probably suspect by now, this is actually happening in production, and although the tables are by no means huge (only hundreds of records), the filesort on the invoice table (which contains a PDF file) is killing the server when I run queries similar to the one above (which I need in order to know which orders have been issued invoices, and which weren't).
Before you ask, I designed the database, and I thought I was safe storing the PDF files in that table because I never ever need any search queries on it -- I always have its primary key at hand!
Update (comments synopsis)
Here's a synopsis of what was suggested in the comments below, so you don't have to read all of that:
- *You should add a key on customer_order.invoice* -- I actually tried that in production, it makes no difference (as it shouldn't)
- You should use
USE INDEX
-- tried that, didn't work. I also triedFORCE INDEX
-- no result either (no change whatsoever) - You oversimplified the use case, we need the actual production query -- I might have stripped it a bit too much in the first iteration, so I updated it (I just added
, ci.invoice_no
in theSELECT
for the last couple of queries). For the record, if anybody's really curious, here's the production query, exactly as it is (this retrieves the last page of orders):
SELECT corder.id, corder.public_id, CONCAT(buyer.fname," ",buyer.lname) AS buyer_name, corder.status, corder.payment, corder.reserved AS R, corder.tracking_id!="" as A, corder.payment_received as pay_date, invoice.invoice_no AS inv, invoice.receipt_no AS rec, invoice.public AS pub_inv, proforma.proforma_no AS prof, proforma.public AS pub_pf, corder.rating, corder.rating_comments!="" AS got_comment FROM corder LEFT JOIN user as buyer ON buyer.id=corder.buyer LEFT JOIN invoice as invoice ON invoice.id=corder.invoice LEFT JOIN invoice as proforma ON proforma.id=corder.proforma ORDER BY id DESC LIMIT 400, 20;
The query above (which, again, is exactly what I run in production) takes about 14 seconds to run. Here's the simplified query, as shown in the use case above, executed on production:
SELECT corder.id, invoice.invoice_no FROM corder LEFT JOIN invoice ON invoice.id=corder.invoice ORDER BY corder.id DESC LIMIT 400, 20;
This one takes 13 seconds to run. Be advised the LIMIT makes no difference whatsoever as long as we're talking about the last page of results (which we are). That is, there's absolutely no significant difference between retrieving the last 12 results or all 412 results when filesort is involved.
Conclusion
ypercube's answer is not only correct, but unfortunately it seems to be the only legitimate one. I tried to further separate conditions from fields, because the SELECT * FROM corder
subquery can end up involving a lot of data, if corder itself contains LONGBLOBs (and duplicating the fields from the main query in the subquery is inelegant), but unfortunately it doesn't seem to work:
SELECT corder.id, corder.public_id, CONCAT(buyer.fname," ",buyer.lname) AS buyer_name, corder.status, corder.payment, corder.reserved AS R, corder.tracking_id != "" AS A, corder.payment_received AS pay_date, invoice.invoice_no AS inv, invoice.receipt_no AS rec, invoice.public AS pub_inv, proforma.proforma_no AS prof, proforma.public AS pub_pf, corder.rating, corder.rating_comments!="" AS got_comment FROM corder LEFT JOIN user as buyer ON buyer.id = corder.buyer LEFT JOIN invoice AS invoice ON invoice.id = corder.invoice LEFT JOIN invoice AS proforma ON proforma.id = corder.proforma WHERE corder.id IN ( SELECT id FROM corder ORDER BY id DESC LIMIT 400,20 ) ORDER BY corder.id DESC;
This fails, with the following error message:
ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
I'm using MySQL 5.1.61, which is reasonably recent in the 5.1 family (and apparently this is not supported in 5.5.x either).
回答1:
Can you try this version (it basically gets first the 420 rows of the corder
table, keeps the 20 of them and then does the 3 outer joins):
SELECT
corder.id,
corder.public_id,
CONCAT(buyer.fname," ",buyer.lname) AS buyer_name,
corder.status,
corder.payment,
corder.reserved AS R,
corder.tracking_id != "" AS A,
corder.payment_received AS pay_date,
invoice.invoice_no AS inv,
invoice.receipt_no AS rec,
invoice.public AS pub_inv,
proforma.proforma_no AS prof,
proforma.public AS pub_pf,
corder.rating,
corder.rating_comments!="" AS got_comment
FROM
( SELECT *
FROM corder
ORDER BY
id DESC
LIMIT 400, 20
)
AS corder
LEFT JOIN user as buyer ON buyer.id = corder.buyer
LEFT JOIN invoice AS invoice ON invoice.id = corder.invoice
LEFT JOIN invoice AS proforma ON proforma.id = corder.proforma
ORDER BY
corder.id DESC ;
来源:https://stackoverflow.com/questions/13179308/why-doesnt-mysql-use-the-primary-key-on-join-plus-order