MySQL Query IN() Clause Slow on Indexed Column

前端未结

关注

 5  412

I Have a MySQL query that is being generated by a PHP script, the query will look something like this:

SELECT * FROM Recipe_Data WHERE 404_Without_200 = 0 A


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  臣服心动        
                
              
                            
                2020-12-03 01:53
              
            
            
                                                                       
The problem is that IN is basically treated as a bunch of ORs (e.g.

col IN (1,2,3)


is

col = 1 OR col = 2 OR col = 3


This is a LOT slower than a join.

What you should do is to generate the SQL code which creates the temporary table, populates it with the values in the "IN" clause, and then join with that temp table

CREATE TEMPORARY TABLE numbers (n INT)


Then in a loop, add

INSERT numbers  VALUES ($next_number)


Then at the end

SELECT * FROM numbers, Recipe_Data 
WHERE numbers.n = RHD_No

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  不思量自难忘°        
                
              
                            
                2020-12-03 01:55
              
            
            
                                                                       
I'm going to gamble here and suggest that executing the following query just once to create an index suitable for your query should reduce the query time by at least a second...

CREATE INDEX returnstatus ON Recipe_Data(404_Without_200,Failures_Without_Success)


See:  http://dev.mysql.com/doc/refman/5.0/en/create-index.html for creating indexes, and http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html for how indexes are used in queries.

Failing that, view all running processes on mysql to see if a currently running query from any source just refuses to die while consuming all the server's time and kill it.  See: http://dev.mysql.com/doc/refman/5.0/en/kill.html

Failing that, determine what else each record may have in common to avoid having to reference each one individually by ID number in your IN statement.  If necessary, add another table column to track that commonality.  Then, add column(s) having that commonality to the above index and filter by that in your WHERE clause instead of using the IN statement.  For example, if you want only those ID numbers to print out on page, have a visible column as type: tinyint with value 0 to exclude, and value 1 to include in your search results, then add visible column to your indexs and WHERE clause to speed up the query.  You wouldn't need that IN statement at all.

Perhaps your in statement is dynamically built using a previous query.  If that's the case, try pulling all rows with Recipe_Data WHERE 404_Without_200 = 0 AND Failures_Without_Success = 0.  Then in your PHP script, simply discard a record in your fetch loop if the RHD_No doesn't match an expected value.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  鱼传尺愫        
                
              
                            
                2020-12-03 01:56
              
            
            
                                                                       
You are accessing 420 rows by primary key which will probably lead to an index access path. This could access 2 index pages and one data page per key. If these are in cache, the query should run fast. If not, every page access that goes to disk will incur the usual disk latency. If we assume 5ms disk latency and 80% cache hits, we arrive at 420*3*0.2*5ms=1.2 seconds which is on the order of what you're seeing.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦毁少年i        
                
              
                            
                2020-12-03 01:58
              
            
            
                                                                       
You should transform the IN clauses to INNER JOIN clauses.

You can transform a query like this one:

SELECT  foo   
FROM    bar   
WHERE bar.stuff IN  
       (SELECT  stuff FROM asdf)


Into a query like this other one:

SELECT  b.foo 
FROM    ( 
        SELECT  DISTINCT stuff 
        FROM    asdf ) a 
JOIN    bar b 
ON      b.stuff = a.stuff


You will gain a lot of performance.

As the php generate the query, try some kind of trick like a temporary table for the items inside the IN clause. Always try to avoid the IN clauses if you can, because they are very time consuming.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  时光说笑        
                
              
                            
                2020-12-03 02:08
              
            
            
                                                                       
For someone like me using SQlAlchemy, using for-loop is also a good option:

rows=[]

for id in ids:
  row = cls.query.filter(cls.id==id).first()
  if row:
     rows.append(row)

#return rows

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复