问题
I have a following table
CREATE TABLE `test_series_analysis_data` (
`email` varchar(255) NOT NULL,
`mappingId` int(11) NOT NULL,
`packageId` varchar(255) NOT NULL,
`sectionName` varchar(255) NOT NULL,
`createdAt` datetime(3) DEFAULT NULL,
`marksObtained` float NOT NULL,
`updatedAt` datetime DEFAULT NULL,
`testMetaData` longtext,
PRIMARY KEY (`email`,`mappingId`,`packageId`,`sectionName`),
KEY `rank_index` (`mappingId`,`packageId`,`sectionName`,`marksObtained`),
KEY `mapping_package` (`mappingId`,`packageId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
Following is the output of the explain for the queries:
explain select rank
from (
select email, @i:=@i+1 as rank
from test_series_analysis_data ta
join (select @i:=0) va
where mappingId = ?1
and packageId = ?2
and sectionName = ?3
order by marksObtained desc
) as inter
where inter.email = ?4;
+----+-------------+------------+------------+--------+----------------------------+-------------+---------+-------+-------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+----------------------------+-------------+---------+-------+-------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 767 | const | 10 | 100.00 | NULL |
| 2 | DERIVED | <derived3> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | Using filesort |
| 2 | DERIVED | ta | NULL | ref | rank_index,mapping_package | rank_index | 4 | const | 20160 | 1.00 | Using where; Using index |
| 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+------------+------------+--------+----------------------------+-------------+---------+-------+-------+----------+--------------------------+
Query optimizer could have used both indexes but rank_index is a covering index so it got picked. What surprises me is the output of the following query:
explain select rank
from (
select email, @i:=@i+1 as rank
from test_series_analysis_data ta use index (mapping_package)
join (select @i:=0) va
where mappingId = ?1
and packageId = ?2
and sectionName = ?3
order by marksObtained desc
) as inter
where inter.email = ?4;
+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+-------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+-------+----------+-----------------------+
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 767 | const | 10 | 100.00 | NULL |
| 2 | DERIVED | <derived3> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | Using filesort |
| 2 | DERIVED | ta | NULL | ref | mapping_package | mapping_package | 4 | const | 19434 | 1.00 | Using index condition |
| 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+-------+----------+-----------------------+
Why are there rows
lesser (19434<20160) when the index being used is mapping_package. rank_index can better select what is required so the row count should be lesser in rank_index.
So does this mean mapping_package index is better than rank_index for the given query?
Does it have any effect that sectionName is a varchar so both indexes should give similar performance?
Also I am assuming Using index condition
is selecting only few rows from index and scanning some more. While in case Using where; Using index
, optimizer has to only read the index and not table to get rows and then it is selecting some data. Then why Using where
is missing while using rank_index?
Moreover why is the key_len for mapping_package is 4 when there are only two columns in the index?
Help appreciated.
回答1:
(19434<20160)
-- Both of those numbers are estimates. It is unusual for them to be that close. I'll bet if you did ANALYZE TABLE
, both would change, possibly changing the inequality.
Notice something else: Using where; Using index
versus Using index condition
.
But first, let me remind you that, in InnoDB, the PRIMARY KEY
columns are tacked onto the secondary key. So, effectively you have
KEY `rank_index` (`mappingId`,`packageId`,`sectionName`,`marksObtained`,`email`)
KEY `mapping_package` (`mappingId`,`packageId`,`email`,`sectionName`)
Now let's decide what the optimal index should be: where mappingId = ?1 and packageId = ?2 and sectionName = ?3 order by marksObtained desc
- First, the
=
parts ofWHERE
:mappingId
,packageId
,sectionName
, in any order; - Then the
ORDER BY
column(s):marksObtained
- Bonus: Finally if
email
(the only other column mentioned anywhere in theSELECT
) is in the key, it will be "Covering".
This says that rank_index
is "perfect", and the other index is not so good. Alas, EXPLAIN
does not clearly say that.
You, too, could have figured this out -- all you needed is to study my blog: http://mysql.rjweb.org/doc.php/index_cookbook_mysql (Sorry; it's getting late, and I am getting cheeky.)
Other tips:
- Don't blindly use
(255)
. When a tmp table is needed, this can make the the tmp table bigger, hence less efficient. Lower the limit to something reasonable. Or... - If this is a huge table, you really ought to 'normalize' the strings, replacing them with maybe a 2-byte
SMALLINT UNSIGNED
. This will improve performance in other ways, such as decreasing costly I/O. (OK, 20 rows is pretty small, so this may not apply.)
Why is key_len
4? That implies that one column was used, namely the 4-byte INT
mappingId
. I would have expected it to use the second column, too. So, I am stumped. EXPLAIN FORMAT=JSON SELECT ...
may provide more clues.
来源:https://stackoverflow.com/questions/41620312/mysql-not-picking-correct-row-count-from-index