I have a star schema here and I am querying the fact table and would like to join one very small dimension table. I can\'t really explain the following:
EXPL
Rewritten with (recommended) explicit ANSI JOIN syntax:
SELECT COUNT(impression_id), imp.os_id, os.os_desc
FROM bi.impressions imp
JOIN bi.os_desc os ON os.os_id = imp.os_id
GROUP BY imp.os_id, os.os_desc;
First of all, your second query might be wrong, if more or less than exactly one match are found in os_desc
for every row in impressions.
This can be ruled out if you have a foreign key constraint on os_id
in place, that guarantees referential integrity, plus a NOT NULL
constraint on bi.impressions.os_id
. If so, in a first step, simplify to:
SELECT COUNT(*) AS ct, imp.os_id, os.os_desc
FROM bi.impressions imp
JOIN bi.os_desc os USING (os_id)
GROUP BY imp.os_id, os.os_desc;
count(*)
is faster than count(column)
and equivalent here if the column is NOT NULL
. And add a column alias for the count.
Faster, yet:
SELECT os_id, os.os_desc, sub.ct
FROM (
SELECT os_id, COUNT(*) AS ct
FROM bi.impressions
GROUP BY 1
) sub
JOIN bi.os_desc os USING (os_id)
Aggregate first, join later. More here:
HashAggregate (cost=868719.08..868719.24 rows=16 width=10)
HashAggregate (cost=1448560.83..1448564.99 rows=416 width=22)
Hmm, width from 10 to 22 is a doubling. Perhaps you should join after grouping instead of before?
The following query solves the problem without increasing the query execution time. The question still stands why does the execution time increase significantly with adding a very simple join, but it might be a Postgres specific question and somebody with extensive experience in the area might answer it eventually.
WITH
OSES AS (SELECT os_id,os_desc from bi.os_desc)
SELECT
COUNT(impression_id) as imp_count,
os_desc FROM bi.impressions imp,
OSES os
WHERE
os.os_id=imp.os_id
GROUP BY os_desc
ORDER BY imp_count;