Adding limit clause to MySQL query slows it down dramatically

后端 未结 2 1766
时光说笑
时光说笑 2021-01-02 22:28

I\'m trying to troubleshoot a performance issue on MySQL, so I wanted to create a smaller version of a table to work with. When I add a LIMIT clause to the query, it goes f

相关标签:
2条回答
  • 2021-01-02 22:42

    Had the same problem. When I added FORCE INDEX (id) it went back to the few milliseconds of a query it was without the limit, while producing the same results.

    0 讨论(0)
  • 2021-01-02 23:04

    A bit of a long answer but I hope you can learn something from this.

    So based on the evidence in the explain statement you can see that there was two possible indexes that the MySQL query optimizer could have used they are as follows:

    possible_keys
    PRIMARY,insert_date 
    

    However the MySQL query optimizer decided to use the following index:

    key
    insert_date
    

    This is a rare occasion where MySQL query optimizer used the wrong index. Now there is a probable cause for this. You are working on a static development database. You probably restored this from production to do development against.

    When the MySQL optimizer needs to make a decision on which index to use in a query it looks at the statistics around all the possible indexes. You can read more about statistics here http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-other-changes-statistics-estimation.html for a starter.

    So when you update, insert and delete from a table you change the index statistics. It might be that the MySQL server because of the static data had the wrong statistics and chose the wrong index. This however is just a guess at this point as a possible root cause.

    Now lets dive into the indexes. There was two possible indexes to use the primary key index and the index on insert_date. MySQL used the insert_date one. Remember during a query execution MySQL can only use one index always. Lets look at the difference between the primary key index and the insert_date index.

    Simple fact about a primary key index(aka clustered):

    1. A primary key index is normally a btree structure that contains the data rows i.e. it is the table as it contains the date.

    Simple fact about secondary index(aka non-clustered):

    1. A secondary index is normally a btree structure that contains the data being indexed(the columns in the index) and a pointer to the location of the data row on the primary key index.

    This is a subtle but big difference.

    Let me explain when you read a primary key index you are reading the table. The table is in order of the primary index as well. Thus to find a value I would search the index read the data which is 1 operation.

    When you read a secondary index you search the index find the pointer then read the primary key index to find the data based on the pointer. This is essentially 2 operations making the operation of reading a secondary index twice as costly as reading the primary key index.

    In your case since it chose the insert_date as the index to use it was doing double the work just to do the join. That is problem one.

    Now when you LIMIT a recordset it is the last piece of execution of the query. MySQL has to take the entire recordset sort it (if not sorted allready) based on ORDER BY and GROUP BY conditions then take the number of records you want and send it back based on the LIMIT BY section. MySQL has to do a lot of work to keep track of records to send and where it is in the record set etc. LIMIT BY does have a performance hit but I suspect there might be a contributing factor read on.

    Look at your GROUP BY it is by player_id. The index that is used is insert_date. GROUP BY essentially orders your record set, however since it had no index to use for ordering (remember a index is sorted in the order of the column(s) contained in it). Essentially you were asking sort/order on player_id and the index used was sorted on insert_date.

    This step caused the filesort problem which essentially takes the data that is returned from reading the secondary index and primary index(remember the 2 operations) and then has to sort them. Sorting is normally done on disk as it is a very very expensive operation to do in memory. Thus the entire query result was written to disk and sorted painfully slow to get you your results.

    By removing the insert_date index MySQL will now use the primary key index which means the data is ordered(ORDER BY/GROUP BY) player_id and insert_date. This will eliminate the need to read the secondary index and then use the pointer to read the primary key index i.e. the table, and since the data is already sorted MySQL has very little work when applying the GROUP BY piece of the query.

    Now the following is a bit of a educated guess again if you could post the results of the explain statement after the index was dropped I would probably be able to confirm my thinking. So by using the wrong index the results were sorted on disk to apply the LIMIT BY properly. Removing the LIMIT BY allows MySQL to probably sort in Memory as it does not have to apply the LIMIT BY and keep track of what is being returned. The LIMIT BY probably caused the temporary table to be created. Once again difficult to say without seeing the difference between the statements i.e. output of explain.

    Hopefully this gives you a better understanding of indexes and why they are a double edged sword.

    0 讨论(0)
提交回复
热议问题