Why does the following join increase the query time significantly?

前端未结

关注

 3  1333

I have a star schema here and I am querying the fact table and would like to join one very small dimension table. I can\'t really explain the following:

EXPL


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  时光取名叫无心        
                
              
                            
                2020-12-07 04:28
              
            
            
                                                                       
Rewritten with (recommended) explicit ANSI JOIN syntax:

SELECT COUNT(impression_id), imp.os_id, os.os_desc 
FROM   bi.impressions imp
JOIN   bi.os_desc os ON os.os_id = imp.os_id
GROUP  BY imp.os_id, os.os_desc;


First of all, your second query might be wrong, if more or less than exactly one match are found in os_desc for every row in impressions.

This can be ruled out if you have a foreign key constraint on os_id in place, that guarantees referential integrity, plus a NOT NULL constraint on bi.impressions.os_id. If so, in a first step, simplify to:

SELECT COUNT(*) AS ct, imp.os_id, os.os_desc 
FROM   bi.impressions imp
JOIN   bi.os_desc     os USING (os_id)
GROUP  BY imp.os_id, os.os_desc;


count(*) is faster than count(column) and equivalent here if the column is NOT NULL. And add a column alias for the count.

Faster, yet:

SELECT os_id, os.os_desc, sub.ct
FROM  (
   SELECT os_id, COUNT(*) AS ct
   FROM   bi.impressions
   GROUP  BY 1
   ) sub
JOIN   bi.os_desc os USING (os_id)


Aggregate first, join later. More here:


Aggregate a single column in query with many columns  
PostgreSQL - order by an array

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  滥情空心        
                
              
                            
                2020-12-07 04:48
              
            
            
                                                                       
HashAggregate  (cost=868719.08..868719.24 rows=16 width=10)
HashAggregate  (cost=1448560.83..1448564.99 rows=416 width=22)


Hmm, width from 10 to 22 is a doubling.  Perhaps you should join after grouping instead of before?
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  你的背包        
                
              
                            
                2020-12-07 04:51
              
            
            
                                                                       
The following query solves the problem without increasing the query execution time. The question still stands why does the execution time increase significantly with adding a very simple join, but it might be a Postgres specific question and somebody with extensive experience in the area might answer it eventually.

WITH 
  OSES AS (SELECT os_id,os_desc from bi.os_desc) 
SELECT 
  COUNT(impression_id) as imp_count, 
  os_desc FROM bi.impressions imp, 
  OSES os 
WHERE 
  os.os_id=imp.os_id 
GROUP BY os_desc 
ORDER BY imp_count;

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复