Why does the following join increase the query time significantly?

前端 未结 3 1333
逝去的感伤
逝去的感伤 2020-12-07 04:01

I have a star schema here and I am querying the fact table and would like to join one very small dimension table. I can\'t really explain the following:

EXPL         


        
相关标签:
3条回答
  • 2020-12-07 04:28

    Rewritten with (recommended) explicit ANSI JOIN syntax:

    SELECT COUNT(impression_id), imp.os_id, os.os_desc 
    FROM   bi.impressions imp
    JOIN   bi.os_desc os ON os.os_id = imp.os_id
    GROUP  BY imp.os_id, os.os_desc;
    

    First of all, your second query might be wrong, if more or less than exactly one match are found in os_desc for every row in impressions.
    This can be ruled out if you have a foreign key constraint on os_id in place, that guarantees referential integrity, plus a NOT NULL constraint on bi.impressions.os_id. If so, in a first step, simplify to:

    SELECT COUNT(*) AS ct, imp.os_id, os.os_desc 
    FROM   bi.impressions imp
    JOIN   bi.os_desc     os USING (os_id)
    GROUP  BY imp.os_id, os.os_desc;
    

    count(*) is faster than count(column) and equivalent here if the column is NOT NULL. And add a column alias for the count.

    Faster, yet:

    SELECT os_id, os.os_desc, sub.ct
    FROM  (
       SELECT os_id, COUNT(*) AS ct
       FROM   bi.impressions
       GROUP  BY 1
       ) sub
    JOIN   bi.os_desc os USING (os_id)
    

    Aggregate first, join later. More here:

    • Aggregate a single column in query with many columns
    • PostgreSQL - order by an array
    0 讨论(0)
  • 2020-12-07 04:48
    HashAggregate  (cost=868719.08..868719.24 rows=16 width=10)
    HashAggregate  (cost=1448560.83..1448564.99 rows=416 width=22)
    

    Hmm, width from 10 to 22 is a doubling. Perhaps you should join after grouping instead of before?

    0 讨论(0)
  • 2020-12-07 04:51

    The following query solves the problem without increasing the query execution time. The question still stands why does the execution time increase significantly with adding a very simple join, but it might be a Postgres specific question and somebody with extensive experience in the area might answer it eventually.

    WITH 
      OSES AS (SELECT os_id,os_desc from bi.os_desc) 
    SELECT 
      COUNT(impression_id) as imp_count, 
      os_desc FROM bi.impressions imp, 
      OSES os 
    WHERE 
      os.os_id=imp.os_id 
    GROUP BY os_desc 
    ORDER BY imp_count;
    
    0 讨论(0)
提交回复
热议问题