When joining to a very small/empty table why MySQL makes a full scan in spite of I'm using “LIMIT”?

后端 未结 3 1325
栀梦
栀梦 2021-01-16 15:54

EDIT: I removed the GROUP BY clause from the example queries but the same problem shows \"When I join table x to an empty/1 row table y MySQL makes a full table

相关标签:
3条回答
  • 2021-01-16 16:11

    After some tests it turns out that if the second table(user_school_mm) has some data MySQL will not make full table scan on the first table, and if the second table(country) has no data/very little data (1 or 2 records) MySQL will do a full table scan. Why this happens? I don't know.

    How to reproduce

    1- Create a schema like this

    CREATE TABLE `event` (
       `ev_id` int(11) NOT NULL AUTO_INCREMENT,
       `ev_note` varchar(255) DEFAULT NULL,
       PRIMARY KEY (`ev_id`)
     ) ENGINE=InnoDB;
    
    CREATE TABLE `table1` (
       `id` int(11) NOT NULL AUTO_INCREMENT,
       `name` varchar(45) DEFAULT NULL,   
       PRIMARY KEY (`id`)
     ) ENGINE=InnoDB ;
    
    CREATE TABLE `table2` (
       `id` int(11) NOT NULL AUTO_INCREMENT,
       `name` varchar(45) DEFAULT NULL,   
       PRIMARY KEY (`id`)
     ) ENGINE=InnoDB ;
    

    2- insert in the main table (event in this case) some data (I filled it with 35601000 rows)

    3- leave table1 empty and insert 15 rows in table2

    insert into table2 (name) values 
    ('fooBar'),('fooBar'),('fooBar'),('fooBar'),('fooBar'),
    ('fooBar'),('fooBar'),('fooBar'),('fooBar'),('fooBar'),
    ('fooBar'),('fooBar'),('fooBar'),('fooBar'),('fooBar');
    

    4- now join the main table with table2 and retest the same query with table1

    Query 1 (Fast)

    select * 
    from 
        event left join 
        table2 on event.ev_id = table2.id
    order by event.ev_id
    limit 2;
    -- executed in 300 milliseconds measured by the client
    

    Explain

    +---+-----------+--------+------+----------------+--------+---------+------------------+------+--------+
    |id |select_type|table   | type | possible_keys  | key    | key_len | ref              | rows | Extra  |
    +---+-----------+--------+------+----------------+--------+---------+------------------+------+--------+
    |1  |SIMPLE     |event   |index |                |PRIMARY |4        |                  | 2    |        |
    |1  |SIMPLE     |table2  |eq_ref|PRIMARY         |PRIMARY |4        |tests.event.ev_id | 1    |        |
    +---+-----------+--------+------+----------------+--------+---------+------------------+------+--------+
    

    Query 2 (Slow)

    select * 
    from 
        event left join 
        table1 on event.ev_id = table1.id
    order by event.ev_id
    limit 2;
    -- executed in 79 seconds measured by the client
    

    Explain

    +---+-----------+--------+------+----------------+--------+---------+-------+---------+---------------------------------------------------+
    |id |select_type|table   | type | possible_keys  | key    | key_len | ref   | rows    | Extra                                             |
    +---+-----------+--------+------+----------------+--------+---------+-------+---------+---------------------------------------------------+
    |1  |SIMPLE     |event   |ALL   |                |        |         |       |33506704 | Using temporary; Using filesort                   |
    |1  |SIMPLE     |table1  |ALL   |PRIMARY         |        |         |       |1        | Using where; Using join buffer (Block Nested Loop)|
    +---+-----------+--------+------+----------------+--------+---------+-------+---------+---------------------------------------------------+
    

    MySQL version is 5.6.38

    0 讨论(0)
  • 2021-01-16 16:13

    The MySQL optimizer will decide on join order/method first, and then check whether, for the chosen join order, it is possible to avoid sorting by using an index. For the slow query in this question, the optimizer has decided to use Block-Nested-Loop (BNL) join.

    BNL is usually quicker than using an index when one of the tables is very small (and there is no LIMIT).

    However, with BNL, rows will not necessarily come in the order given by the first table. Hence, the result of the join needs to be sorted before applying the LIMIT.

    You can turn off BNL by set optimizer_switch = 'block_nested_loop=off';

    0 讨论(0)
  • 2021-01-16 16:31

    The main reason is the misuse of GROUP BY. Let's take the first query. Even though it is "fast", it is still "wrong":

    SELECT * 
        FROM users
        LEFT JOIN user_school_mm on users.id = user_school_mm.user_id
        GROUP BY users.id
        ORDER BY users.id ASC
        LIMIT 2
    

    A user can go to two schools. The use of the many:many mapping user_school_mm claims that is a possibility. So, after doing the JOIN, you get 2 rows for a single user. But then, you GROUP BY users.id, to boil it down to a single row. But... Which of the two school_id values should you use??

    I am not going to try to address the performance issues until you present queries that make sense. At that point it will be easier to point out why one query performs better than another.

    0 讨论(0)
提交回复
热议问题