A SQL query searching for rows that satisfy Column1 <= X <= Column2 is very slow

后端 未结 12 1321
盖世英雄少女心
盖世英雄少女心 2021-01-11 16:27

I am using a MySQL DB, and have the following table:

CREATE TABLE SomeTable (
  PrimaryKeyCol BIGINT(20) NOT NULL,
  A BIGINT(20) NOT NULL,
  FirstX INT(11) N         


        
12条回答
  •  时光说笑
    2021-01-11 17:06

    WHERE col1 < ... AND ... < col2 is virtually impossible to optimize.

    Any useful query will involve a "range" on either col1 or col2. Two ranges (on two different columns) cannot be used in a single INDEX.

    Therefore, any index you try has the risk of checking a lot of the table: INDEX(col1, ...) will scan from the start to where col1 hits .... Similarly for col2 and scanning until the end.

    To add to your woes, the ranges are overlapping. So, you can't pull a fast one and add ORDER BY ... LIMIT 1 to stop quickly. And if you say LIMIT 10, but there are only 9, it won't stop until the start/end of the table.

    One simple thing you can do (but it won't speed things up by much) is to swap the PRIMARY KEY and the UNIQUE. This could help because InnoDB "clusters" the PK with the data.

    If the ranges did not overlap, I would point you at http://mysql.rjweb.org/doc.php/ipranges .

    So, what can be done?? How "even" and "small" are the ranges? If they are reasonably 'nice', then the following would take some code, but should be a lot faster. (In your example, 100000 500000 is pretty ugly, as you will see in a minute.)

    Define buckets to be, say, floor(number/100). Then build a table that correlates buckets and ranges. Samples:

    FirstX  LastX  Bucket
    123411  123488  1234
    222222  222444  2222
    222222  222444  2223
    222222  222444  2224
    222411  222477  2224
    

    Notice how some ranges 'belong' to multiple buckets.

    Then, the search is first on the bucket(s) in the query, then on the details. Looking for X=222433 would find two rows with bucket=2224, then decide that both are OK. But for X=222466, two rows have the bucket, but only one matches with firstX and lastX.

    WHERE bucket = FLOOR(X/100)
      AND firstX <= X
      AND X <= lastX
    

    with

    INDEX(bucket, firstX)
    

    But... with 100000 500000, there would be 4001 rows because this range is in that many 'buckets'.

    Plan B (to tackle the wide ranges)

    Segregate the ranges into wide and narrow. Do the wide ranges by a simple table scan, do the narrow ranges via my bucket method. UNION ALL the results together. Hopefully the "wide" table would much smaller than the "narrow" table.

提交回复
热议问题