A SQL query searching for rows that satisfy Column1 <= X <= Column2 is very slow

后端未结

关注

 12  1321

盖世英雄少女心 2021-01-11 16:27

I am using a MySQL DB, and have the following table:

CREATE TABLE SomeTable (
  PrimaryKeyCol BIGINT(20) NOT NULL,
  A BIGINT(20) NOT NULL,
  FirstX INT(11) N


      
      
        
          12条回答        

        
                    
            
            
                         
                
              
              
                
                   时光说笑
                                             
                
                
                (楼主)
            
              
              
                2021-01-11 17:06
              

            
            
                        
WHERE col1 < ... AND ... < col2 is virtually impossible to optimize.

Any useful query will involve a "range" on either col1 or col2.  Two ranges (on two different columns) cannot be used in a single INDEX.

Therefore, any index you try has the risk of checking a lot of the table:
INDEX(col1, ...) will scan from the start to where col1 hits ....  Similarly for col2 and scanning until the end.

To add to your woes, the ranges are overlapping.  So, you can't pull a fast one and add ORDER BY ... LIMIT 1 to stop quickly.  And if you say LIMIT 10, but there are only 9, it won't stop until the start/end of the table.

One simple thing you can do (but it won't speed things up by much) is to swap the PRIMARY KEY and the UNIQUE.  This could help because InnoDB "clusters" the PK with the data.

If the ranges did not overlap, I would point you at http://mysql.rjweb.org/doc.php/ipranges .

So, what can be done??  How "even" and "small" are the ranges?  If they are reasonably 'nice', then the following would take some code, but should be a lot faster.  (In your example, 100000     500000 is pretty ugly, as you will see in a minute.)

Define buckets to be, say, floor(number/100).  Then build a table that correlates buckets and ranges.  Samples:

FirstX  LastX  Bucket
123411  123488  1234
222222  222444  2222
222222  222444  2223
222222  222444  2224
222411  222477  2224


Notice how some ranges 'belong' to multiple buckets.

Then, the search is first on the bucket(s) in the query, then on the details.  Looking for X=222433 would find two rows with bucket=2224, then decide that both are OK.  But for X=222466, two rows have the bucket, but only one matches with firstX and lastX.

WHERE bucket = FLOOR(X/100)
  AND firstX <= X
  AND X <= lastX


with

INDEX(bucket, firstX)


But... with 100000     500000, there would be 4001 rows because this range is in that many 'buckets'.

Plan B (to tackle the wide ranges)

Segregate the ranges into wide and narrow.  Do the wide ranges by a simple table scan, do the narrow ranges via my bucket method.  UNION ALL the results together.  Hopefully the "wide" table would much smaller than the "narrow" table.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它12个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复