Why does the following join increase the query time significantly?

天涯浪子 提交于 2019-11-28 02:15:41

Rewritten with (recommended) explicit ANSI JOIN syntax:

SELECT COUNT(impression_id), imp.os_id, os.os_desc 
FROM   bi.impressions imp
JOIN   bi.os_desc os ON os.os_id = imp.os_id
GROUP  BY imp.os_id, os.os_desc;

First of all, your second query might be wrong, if more or less than exactly one match are found in os_desc for every row in impressions.
This can be ruled out if you have a foreign key constraint on os_id in place, that guarantees referential integrity, plus a NOT NULL constraint on bi.impressions.os_id. If so, in a first step, simplify to:

SELECT COUNT(*) AS ct, imp.os_id, os.os_desc 
FROM   bi.impressions imp
JOIN   bi.os_desc     os USING (os_id)
GROUP  BY imp.os_id, os.os_desc;

count(*) is faster than count(column) and equivalent here if the column is NOT NULL. And add a column alias for the count.

Faster, yet:

SELECT os_id, os.os_desc, sub.ct
FROM  (
   SELECT os_id, COUNT(*) AS ct
   FROM   bi.impressions
   GROUP  BY 1
   ) sub
JOIN   bi.os_desc os USING (os_id)

Group first, join later. More here:

HashAggregate  (cost=868719.08..868719.24 rows=16 width=10)
HashAggregate  (cost=1448560.83..1448564.99 rows=416 width=22)

Hmm, width from 10 to 22 is a doubling. Perhaps you should join after grouping instead of before?

The following query solves the problem without increasing the query execution time. The question still stands why does the execution time increase significantly with adding a very simple join, but it might be a Postgres specific question and somebody with extensive experience in the area might answer it eventually.

WITH 
  OSES AS (SELECT os_id,os_desc from bi.os_desc) 
SELECT 
  COUNT(impression_id) as imp_count, 
  os_desc FROM bi.impressions imp, 
  OSES os 
WHERE 
  os.os_id=imp.os_id 
GROUP BY os_desc 
ORDER BY imp_count;
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!