MySQL performance of query making addition of columns in where clause

回眸只為那壹抹淺笑 提交于 2020-01-15 11:17:44

问题


I have a query making an addition of several column values in the WHERE clause. I can't precompute this addition in a single column because the combination of columns to use varies between queries. My problem is that my table is very large (several hundreds of millions of rows) and the performances very bad.

Example table:

+---------+------------+--------+--------+--------+--------+
| tableId | categoryId | value1 | value2 | value3 | value4 |
+---------+------------+--------+--------+--------+--------+
|       1 |          1 |      1 |      0 |      5 |      7 |
|       2 |          1 |      8 |      1 |      7 |      0 |
|       3 |          1 |     10 |      5 |      0 |     20 |
|       4 |          2 |      0 |     15 |      0 |     22 |
|       5 |          2 |     20 |      0 |     11 |      0 |
+---------+------------+--------+--------+--------+--------+

Example queries:

SELECT * FROM myTable WHERE categoryId = 1 AND (value1 + value2 + value3 + value4) > 9;
SELECT * FROM myTable WHERE categoryId = 1 AND (value1 + value3 + value4) > 5;

What would be the best strategy for improving performances of such queries? (edit: I already have an index on categoryId, this does not help)

Does using an index help for such queries? Would I then have to create all possible indexes for all possible combinations of columns? Wouldn't the resulting indexes be very very large?

ALTER TABLE myTable
ADD INDEX(categoryId, value1),
ADD INDEX(categoryId, value2),
ADD INDEX(categoryId, value3),
ADD INDEX(categoryId, value4),
ADD INDEX(categoryId, value1, value2),
ADD INDEX(categoryId, value1, value3),
ADD INDEX(categoryId, value1, value4),
etc

Or maybe creating a link table, with boolean value fields specifying which columns were used? But that would result in a table with several billions rows, not sure this is better...

+---------+-----------+-----------+-----------+-----------+----------+
| tableId | useValue1 | useValue2 | useValue3 | useValue4 | valueSum |
+---------+-----------+-----------+-----------+-----------+----------+
|       1 |         1 |         1 |         1 |         1 |       13 |
|       1 |         1 |         1 |         1 |         0 |        6 |
|       1 |         1 |         1 |         0 |         0 |        1 |
|       1 |         1 |         1 |         0 |         1 |        8 |
|       1 |         1 |         0 |         1 |         1 |       13 |
|       1 |         1 |         0 |         1 |         0 |        6 |
|       1 |         1 |         0 |         0 |         0 |        1 |
|       1 |         1 |         0 |         0 |         1 |        8 |
|       1 |         0 |         1 |         1 |         1 |       12 |
|       1 |         0 |         1 |         1 |         0 |        5 |
|       1 |         0 |         1 |         0 |         0 |        0 |
|       1 |         0 |         1 |         0 |         1 |        7 |
|       1 |         0 |         0 |         1 |         1 |       12 |
|       1 |         0 |         0 |         1 |         0 |        5 |
|       1 |         0 |         0 |         0 |         1 |        7 |
+---------+-----------+-----------+-----------+-----------+----------+

With an Index:

ALTER TABLE linkTable INDEX(tableId, useValue1, useValue2, useValue3, useValue4, valueSum);

Any other ideas?


回答1:


@e4c5 is right that none of the indices will help with the current query. You can start by adding the following indices and alter the query with additional conditions so that the indices get used:

ALTER TABLE myTable
ADD INDEX(categoryId, value1),
ADD INDEX(categoryId, value2),
ADD INDEX(categoryId, value3),
ADD INDEX(categoryId, value4);

And update the query like this:

SELECT * FROM myTable WHERE categoryId = 1 AND (value1 <= 9) AND (value2 <= 9) AND (value3 <= 9) AND (value4 <= 9) AND (value1 + value2 + value3 + value4) > 9;
SELECT * FROM myTable WHERE categoryId = 1 AND (value1 <= 5) AND (value3 <= 5) AND (value4 <= 5) AND (value1 + value3 + value4) > 5;

The additional conditions helps to narrow down the number of rows to be processed. Adding indices on more columns would speed this up further but I suggest trying this first.




回答2:


I'm going to have to make some guesses until I see SHOW CREATE TABLE...

If you have this:

tableId INT UNSIGNED AUTO_INCREMENT NOT NULL,
categoryId INT UNSIGNED NOT NULL,
...
PRIMARY KEY(tableId),

Then change to

tableId INT UNSIGNED AUTO_INCREMENT NOT NULL,  -- same
categoryId INT UNSIGNED NOT NULL,              -- same
...
PRIMARY KEY(categoryId, tableId),  -- different, see Note 1
INDEX(tableId)                     -- different, see Note 2

Note 1. The index (the PK) starting with categoryId will help the queries you presented. Furthermore, by being at the beginning of the PK, it will "cluster" all the necessary rows for one SELECT together, thereby minimizing the I/O in your huge table.

Note 2. Yes, it is OK to have only INDEX(...) for the AUTO_INCREMENT.

Another tip... Since BIGINT is always 8 bytes and INT is 4 bytes; do you really need that big a column? Shrinking the columns sizes will help cut down on I/O, which will significantly speed up the queries. MEDIUMINT UNSIGNED is only 3 bytes and has a range of 0..16M; etc.




回答3:


Based on answers in my follow-up question about the overall database design, the conclusions are:

  • All my data types and indexes are correct.
  • My design with enumerated columns is not very elegant, but adapted to a row-based database such as MySQL and gives the best performances on that kind of engine.
  • To really fix this performance issue, I should move to a column-based database, using a better design as described in comments of my other question (where data to aggregate would be in a same column but several rows).



回答4:


You can put your queries into categories. For each category, you can keep a column which are pre-computed. You can SELECT the related field from the table with respect to required combination of calculations. Of course it is possible if you can categorize your queries.



来源:https://stackoverflow.com/questions/42779175/mysql-performance-of-query-making-addition-of-columns-in-where-clause

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!