The query:
SELECT \"replays_game\".*
FROM \"replays_game\"
INNER JOIN
\"replays_playeringame\" ON \"replays_game\".\"id\" = \"replays_playeringame\".\"game_
You might get a better execution plan using a multiple column (player_id, game_id)
index on the replays_playeringame
table. This avoids having to use a random page seek to look up the game id(s) for the player id.
My guess is that you are using the default random_page_cost = 4
, which is way too high, making index scan too costly.
I try to reconstruct the 2 tables with this script:
CREATE TABLE replays_game (
id integer NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE replays_playeringame (
player_id integer NOT NULL,
game_id integer NOT NULL,
PRIMARY KEY (player_id, game_id),
CONSTRAINT replays_playeringame_game_fkey
FOREIGN KEY (game_id) REFERENCES replays_game (id)
);
CREATE INDEX ix_replays_playeringame_game_id
ON replays_playeringame (game_id);
-- 150k games
INSERT INTO replays_game
SELECT generate_series(1, 150000);
-- ~150k players, ~2 games each
INSERT INTO replays_playeringame
select trunc(random() * 149999 + 1), generate_series(1, 150000);
INSERT INTO replays_playeringame
SELECT *
FROM
(
SELECT
trunc(random() * 149999 + 1) as player_id,
generate_series(1, 150000) as game_id
) AS t
WHERE
NOT EXISTS (
SELECT 1
FROM replays_playeringame
WHERE
t.player_id = replays_playeringame.player_id
AND t.game_id = replays_playeringame.game_id
)
;
-- the heavy player with 3000 games
INSERT INTO replays_playeringame
select 999999, generate_series(1, 3000);
With the default value of 4:
game=# set random_page_cost = 4;
SET
game=# explain analyse SELECT "replays_game".*
FROM "replays_game"
INNER JOIN "replays_playeringame" ON "replays_game"."id" = "replays_playeringame"."game_id"
WHERE "replays_playeringame"."player_id" = 999999;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=1483.54..4802.54 rows=3000 width=4) (actual time=3.640..110.212 rows=3000 loops=1)
Hash Cond: (replays_game.id = replays_playeringame.game_id)
-> Seq Scan on replays_game (cost=0.00..2164.00 rows=150000 width=4) (actual time=0.012..34.261 rows=150000 loops=1)
-> Hash (cost=1446.04..1446.04 rows=3000 width=4) (actual time=3.598..3.598 rows=3000 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 106kB
-> Bitmap Heap Scan on replays_playeringame (cost=67.54..1446.04 rows=3000 width=4) (actual time=0.586..2.041 rows=3000 loops=1)
Recheck Cond: (player_id = 999999)
-> Bitmap Index Scan on replays_playeringame_pkey (cost=0.00..66.79 rows=3000 width=0) (actual time=0.560..0.560 rows=3000 loops=1)
Index Cond: (player_id = 999999)
Total runtime: 110.621 ms
After lowering it to 2:
game=# set random_page_cost = 2;
SET
game=# explain analyse SELECT "replays_game".*
FROM "replays_game"
INNER JOIN "replays_playeringame" ON "replays_game"."id" = "replays_playeringame"."game_id"
WHERE "replays_playeringame"."player_id" = 999999;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=45.52..4444.86 rows=3000 width=4) (actual time=0.418..27.741 rows=3000 loops=1)
-> Bitmap Heap Scan on replays_playeringame (cost=45.52..1424.02 rows=3000 width=4) (actual time=0.406..1.502 rows=3000 loops=1)
Recheck Cond: (player_id = 999999)
-> Bitmap Index Scan on replays_playeringame_pkey (cost=0.00..44.77 rows=3000 width=0) (actual time=0.388..0.388 rows=3000 loops=1)
Index Cond: (player_id = 999999)
-> Index Scan using replays_game_pkey on replays_game (cost=0.00..0.99 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=3000)
Index Cond: (id = replays_playeringame.game_id)
Total runtime: 28.542 ms
(8 rows)
If using SSD, I would lower it further to 1.1.
As for your last question, I really think you should stick with postgresql. I have experience with postgresql and mssql, and I need to put in triple the effort into the later for it to perform half as well as the former.
I ran sayap's testbed-code (Thanks!) , with the following modifications:
After this run, I did the same run, but scaled up tenfold: with 1M5 records (30K hard-hitters)
Currently, I am running the same test with a hundred-fold scale-up, but the initialisation is rather slow...
Results The entries in the cells are the total time in msec plus a string that denotes the chosen queryplan. (only a handfull of plans occur)
Original 3K / 150K work_mem=16M
rpc | 3K | 1K5 | 750 | 375
--------+---------------+---------------+---------------+------------
8* | 50.8 H.BBi.HS| 44.3 H.BBi.HS| 38.5 H.BBi.HS| 41.0 H.BBi.HS
4 | 43.6 H.BBi.HS| 48.6 H.BBi.HS| 4.34 NBBi | 1.33 NBBi
2 | 6.92 NBBi | 3.51 NBBi | 4.61 NBBi | 1.24 NBBi
1 | 6.43 NII | 3.49 NII | 4.19 NII | 1.18 NII
Original 3K / 150K work_mem=64K
rpc | 3K | 1K5 | 750 | 375
--------+---------------+---------------+---------------+------------
8* | 74.2 H.BBi.HS| 69.6 NBBi | 62.4 H.BBi.HS| 66.9 H.BBi.HS
4 | 6.67 NBBi | 8.53 NBBi | 1.91 NBBi | 2.32 NBBi
2 | 6.66 NBBi | 3.6 NBBi | 1.77 NBBi | 0.93 NBBi
1 | 7.81 NII | 3.26 NII | 1.67 NII | 0.86 NII
Scaled 10*: 30K / 1M5 work_mem=16M
rpc | 30K | 15K | 7k5 | 3k75
--------+---------------+---------------+---------------+------------
8* | 623 H.BBi.HS| 556 H.BBi.HS| 531 H.BBi.HS| 14.9 NBBi
4 | 56.4 M.I.sBBi| 54.3 NBBi | 27.1 NBBi | 19.1 NBBi
2 | 71.0 NBBi | 18.9 NBBi | 9.7 NBBi | 9.7 NBBi
1 | 79.0 NII | 35.7 NII | 17.7 NII | 9.3 NII
Scaled 10*: 30K / 1M5 work_mem=64K
rpc | 30K | 15K | 7k5 | 3k75
--------+---------------+---------------+---------------+------------
8* | 729 H.BBi.HS| 722 H.BBi.HS| 723 H.BBi.HS| 19.6 NBBi
4 | 55.5 M.I.sBBi| 41.5 NBBi | 19.3 NBBi | 13.3 NBBi
2 | 70.5 NBBi | 41.0 NBBi | 26.3 NBBi | 10.7 NBBi
1 | 69.7 NII | 38.5 NII | 20.0 NII | 9.0 NII
Scaled 100*: 300K / 15M work_mem=16M
rpc | 300k | 150K | 75k | 37k5
--------+---------------+---------------+---------------+---------------
8* |7314 H.BBi.HS|9422 H.BBi.HS|6175 H.BBi.HS| 122 N.BBi.I
4 | 569 M.I.sBBi| 199 M.I.sBBi| 142 M.I.sBBi| 105 N.BBi.I
2 | 527 M.I.sBBi| 372 N.BBi.I | 198 N.BBi.I | 110 N.BBi.I
1 | 694 NII | 362 NII | 190 NII | 107 NII
Scaled 100*: 300K / 15M work_mem=64K
rpc | 300k | 150k | 75k | 37k5
--------+---------------+---------------+---------------+------------
8* |22800 H.BBi.HS |21920 H.BBi.HS | 20630 N.BBi.I |19669 H.BBi.HS
4 |22095 H.BBi.HS | 284 M.I.msBBi| 205 B.BBi.I | 116 N.BBi.I
2 | 528 M.I.msBBi| 399 N.BBi.I | 211 N.BBi.I | 110 N.BBi.I
1 | 718 NII | 364 NII | 200 NII | 105 NII
[8*] Note: the RandomPageCost=8 runs were only intended as a prerun to prime the disk buffer cache; the results should be ignored.
Legend for node types:
N := Nested loop
M := Merge join
H := Hash (or Hash join)
B := Bitmap heap scan
Bi := Bitmap index scan
S := Seq scan
s := sort
m := materialise
Preliminary conclusion:
"the working set" for the original query is too small: all of it fits in core, resulting in the cost of page fetches to be grossly overestimated. Setting RPC to 2 (or 1) "solves" this problem, but once the query is scaled-up, the page-costs become dominant, and RPC=4 becomes comparable or even better.
Setting work_mem to a lower value is another way to make the optimiser shift to index-scans (instead of hash+bitmap-scans). The differences I found are smaller than what Sayap reported. Maybe I have more effective_cache_size, or he forgot to prime the cache?
This is an old post, but quite helpful that I just encountered a similar issue.
Here is my finding so far. Given there are 151208 rows in the replays_game
, the average cost of hitting an item is about log(151208)=12
. Since there are 3395
records in replays_playeringame
after filtering, the average cost is 12*3395
, which is rather high. Also, the planner overestimated the page cost: it assumes all rows are randomly distributed, while it is not. Should that be true, a seq scan would be much better. So basically, the query plan is trying to avoid the worst scenarios.
@dsjoerg's problem is that there is no index on replays_playeringame(game_id)
. Index scan would be always used if there is an index on replays_playeringame(game_id)
: the cost of scanning index would become 3395+12
(or something close to that).
@Neil suggested to have index on (player_id, game_id)
, which is close but not exact. The right index to have is either (game_id)
or (game_id, player_id)
.