optimize mysql count query

后端未结

关注

 10  637

Is there a way to optimize this further or should I just be satisfied that it takes 9 seconds to count 11M rows ?

devuser@xcmst > mysql --user=user --pass


                      
              相关标签:


      
      
        
          10条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  梦谈多话        
                
              
                            
                2020-12-30 06:03
              
            
            
                                                                       
If you need to return the total table's row count, then there is an alternative to the
SELECT COUNT(*) statement which you can use. SELECT COUNT(*) makes a full table scan to return the total table's row count, so it can take a long time. You can use the sysindexes system table instead in this case. There is a ROWS column in the sysindexes table. This column contains the total row count for each table in your database. So, you can use the following select statement instead of SELECT COUNT(*): 

SELECT rows FROM sysindexes WHERE id = OBJECT_ID('table_name') AND indid < 2 

This can improve the speed of your query.

EDIT: I have discovered that my answer would be correct if you were using a SQL Server database. MySQL databases do not have a sysindexes table. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  鱼传尺愫        
                
              
                            
                2020-12-30 06:06
              
            
            
                                                                       
If the historical data is not volatile, create a summary table. There are various approaches, the one to choose will depend on how your table is updated, and how often.

For example, assuming old data is rarely/never changed, but recent data is, create a monthly summary table, populated for the previous month at the end of each month (eg insert January's count at the end of February). Once you have your summary table, you can add up the full months and the part months at the beginning and end of the range:

select count(*) 
from record_updates 
where date_updated >= '2009-10-11 15:33:22' and date_updated < '2009-11-01';

select count(*) 
from record_updates 
where date_updated >= '2010-12-00';

select sum(row_count) 
from record_updates_summary 
where date_updated >= '2009-11-01' and date_updated < '2010-12-00';


I've left it split out above for clarity but you can do this in one query:

select ( select count(*)
         from record_updates 
         where date_updated >= '2010-12-00'
               or ( date_updated>='2009-10-11 15:33:22' 
                    and date_updated < '2009-11-01' ) ) +
       ( select count(*) 
         from record_updates 
         where date_updated >= '2010-12-00' );


You can adapt this approach for make the summary table based on whole weeks or whole days.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  天命终不由人        
                
              
                            
                2020-12-30 06:10
              
            
            
                                                                       
If mysql has to count 11M rows, there really isn't much of a way to speed up a simple count. At least not to get it to a sub 1 second speed. You should rethink how you do your count. A few ideas:


Add an auto increment field to the table. It looks you wouldn't delete from the table, so you can use simple math to find the record count. Select the min auto increment number for the initial earlier date and the max for the latter date and subtract one from the other to get the record count. For example: 

SELECT min(incr_id) min_id FROM record_updates WHERE date_updated BETWEEN '2009-10-11 15:33:22' AND '2009-10-12 23:59:59';
SELECT max(incr_id) max_id FROM record_updates WHERE date_updated > DATE_SUB(NOW(), INTERVAL 2 DAY);`

Create another table summarizing the record count for each day. Then you can query that table for the total records. There would only be 365 records for each year. If you need to get down to more fine grained times, query the summary table for full days and the current table for just the record count for the start and end days. Then add them all together.


If the data isn't changing, which it doesn't seem like it is, then summary tables will be easy to maintain and update. They will significantly speed things up.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉梦人生        
                
              
                            
                2020-12-30 06:10
              
            
            
                                                                       
Instead of doing count(*), try doing count(1), like this:-

select count(1) from record_updates where date_updated > '2009-10-11 15:33:22'


I took a DB2 class before, and I remember the instructor mentioned about doing a count(1) when we just want to count number of rows in the table regardless the data because it is technically faster than count(*). Let me know if it makes a difference.

NOTE: Here's a link you might be interested to read: http://www.mysqlperformanceblog.com/2007/04/10/count-vs-countcol/ 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  甜味超标        
                
              
                            
                2020-12-30 06:12
              
            
            
                                                                       
It depends on a few things but something like this may work for you

im assuming this count never changes as it is in the past so the result can be cached somehow

count1 = "select count(*) from record_updates where date_updated <= '2009-10-11 15:33:22'"


gives you the total count of records in the table, 
this is an approximate value in innodb table so BEWARE, depends on engine

count2 = "select table_rows from information_schema.`TABLES` where table_schema = 'marctoxctransformation' and TABLE_NAME = 'record_updates'"


your answer

result = count2 - count1
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  遥遥无期        
                
              
                            
                2020-12-30 06:13
              
            
            
                                                                       
There is no primary key in your table. It's possible that in this case it always scans the whole table. Having a primary key is never a bad idea.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复