When joining to a very small/empty table why MySQL makes a full scan in spite of I'm using “LIMIT”?

后端未结

关注

 3  1324

EDIT: I removed the GROUP BY clause from the example queries but the same problem shows \"When I join table x to an empty/1 row table y MySQL makes a full table

相关标签:

3条回答

醉梦人生

2021-01-16 16:11

After some tests it turns out that if the second table(user_school_mm) has some data MySQL will not make full table scan on the first table, and if the second table(country) has no data/very little data (1 or 2 records) MySQL will do a full table scan. Why this happens? I don't know.

How to reproduce

1- Create a schema like this

CREATE TABLE `event` (
   `ev_id` int(11) NOT NULL AUTO_INCREMENT,
   `ev_note` varchar(255) DEFAULT NULL,
   PRIMARY KEY (`ev_id`)
 ) ENGINE=InnoDB;

CREATE TABLE `table1` (
   `id` int(11) NOT NULL AUTO_INCREMENT,
   `name` varchar(45) DEFAULT NULL,   
   PRIMARY KEY (`id`)
 ) ENGINE=InnoDB ;

CREATE TABLE `table2` (
   `id` int(11) NOT NULL AUTO_INCREMENT,
   `name` varchar(45) DEFAULT NULL,   
   PRIMARY KEY (`id`)
 ) ENGINE=InnoDB ;

2- insert in the main table (event in this case) some data (I filled it with 35601000 rows)

3- leave table1 empty and insert 15 rows in table2

insert into table2 (name) values 
('fooBar'),('fooBar'),('fooBar'),('fooBar'),('fooBar'),
('fooBar'),('fooBar'),('fooBar'),('fooBar'),('fooBar'),
('fooBar'),('fooBar'),('fooBar'),('fooBar'),('fooBar');

4- now join the main table with table2 and retest the same query with table1

Query 1 (Fast)

select * 
from 
    event left join 
    table2 on event.ev_id = table2.id
order by event.ev_id
limit 2;
-- executed in 300 milliseconds measured by the client

Explain

+---+-----------+--------+------+----------------+--------+---------+------------------+------+--------+
|id |select_type|table   | type | possible_keys  | key    | key_len | ref              | rows | Extra  |
+---+-----------+--------+------+----------------+--------+---------+------------------+------+--------+
|1  |SIMPLE     |event   |index |                |PRIMARY |4        |                  | 2    |        |
|1  |SIMPLE     |table2  |eq_ref|PRIMARY         |PRIMARY |4        |tests.event.ev_id | 1    |        |
+---+-----------+--------+------+----------------+--------+---------+------------------+------+--------+

Query 2 (Slow)

select * 
from 
    event left join 
    table1 on event.ev_id = table1.id
order by event.ev_id
limit 2;
-- executed in 79 seconds measured by the client

Explain

+---+-----------+--------+------+----------------+--------+---------+-------+---------+---------------------------------------------------+
|id |select_type|table   | type | possible_keys  | key    | key_len | ref   | rows    | Extra                                             |
+---+-----------+--------+------+----------------+--------+---------+-------+---------+---------------------------------------------------+
|1  |SIMPLE     |event   |ALL   |                |        |         |       |33506704 | Using temporary; Using filesort                   |
|1  |SIMPLE     |table1  |ALL   |PRIMARY         |        |         |       |1        | Using where; Using join buffer (Block Nested Loop)|
+---+-----------+--------+------+----------------+--------+---------+-------+---------+---------------------------------------------------+

MySQL version is 5.6.38

0 讨论(0)

终归单人心

2021-01-16 16:13

The MySQL optimizer will decide on join order/method first, and then check whether, for the chosen join order, it is possible to avoid sorting by using an index. For the slow query in this question, the optimizer has decided to use Block-Nested-Loop (BNL) join.

BNL is usually quicker than using an index when one of the tables is very small (and there is no LIMIT).

However, with BNL, rows will not necessarily come in the order given by the first table. Hence, the result of the join needs to be sorted before applying the LIMIT.

You can turn off BNL by set optimizer_switch = 'block_nested_loop=off';

0 讨论(0)
发布评论:

提交评论
- 加载中...
一个人的身影

2021-01-16 16:31
The main reason is the misuse of GROUP BY. Let's take the first query. Even though it is "fast", it is still "wrong":
```
SELECT * 
    FROM users
    LEFT JOIN user_school_mm on users.id = user_school_mm.user_id
    GROUP BY users.id
    ORDER BY users.id ASC
    LIMIT 2
```
A user can go to two schools. The use of the many:many mapping user_school_mm claims that is a possibility. So, after doing the JOIN, you get 2 rows for a single user. But then, you GROUP BY users.id, to boil it down to a single row. But... Which of the two school_id values should you use??

I am not going to try to address the performance issues until you present queries that make sense. At that point it will be easier to point out why one query performs better than another.
0 讨论(0)
发布评论:

提交评论
- 加载中...