Add up conditional counts on multiple columns of the same table

后端未结

关注

 4  1211

清歌不尽 2021-01-21 01:16

I am looking for a \"better\" way to perform a query in which I want to show a single player who he has played previously and the associated win-loss record for each such oppone

4条回答

逝去的感伤 (楼主)

2021-01-21 02:00
Query

The query is not as simple as it looks at first. The shortest query string does not necessarily yield best performance. This should be as fast as it gets, being as short as possible for that:
```
SELECT p.username, COALESCE(w.ct, 0) AS won, COALESCE(l.ct, 0) AS lost
FROM  (
   SELECT loser_id AS player_id, count(*) AS ct
   FROM   match
   WHERE  winner_id = 1  -- your player_id here
   GROUP  BY 1           -- positional reference (not your player_id)
   ) w
FULL JOIN (
   SELECT winner_id AS player_id, count(*) AS ct
   FROM   match
   WHERE  loser_id = 1   -- your player_id here
   GROUP  BY 1
   ) l USING (player_id)
JOIN   player p USING (player_id)
ORDER  BY 1;
```
Result exactly as requested:
```
username | won | lost
---------+-----+-----
alice    | 3   | 2
bob      | 1   | 0
mary     | 2   | 1
```
SQL Fiddle - with more revealing test data!

The key feature is the FULL [OUTER] JOIN between the two subqueries for losses and wins. This produces a table of all players our candidate has played against. The USING clause in the join condition conveniently merges the two player_id columns into one.

After that, a single JOIN to player to get the name, and COALESCE to replace NULL with 0. Voilá.

Index

Would be even faster with two multicolumn indexes:
```
CREATE INDEX idx_winner on match (winner_id, loser_id);
CREATE INDEX idx_loser  on match (loser_id, winner_id);
```
Only if you get index-only scans out of this. Then Postgres does not even visit the match table at all and you get super-fast results.

With two integer columns you happen to hit a local optimum: theses indexes have just the same size as the simple ones you had. Details:
- Is a composite index also good for queries on the first field?
Shorter, but slow

You could run correlated subqueries like @Giorgi suggested, just working correctly:
```
SELECT *
FROM  (
   SELECT username
       , (SELECT count(*) FROM match
          WHERE  loser_id  = p.player_id
          AND    winner_id = 1) AS won
       , (SELECT count(*) FROM match
          WHERE  winner_id = p.player_id
          AND    loser_id  = 1) AS lost
   FROM   player p
   WHERE  player_id <> 1
   ) sub
WHERE (won > 0 OR lost > 0)
ORDER  BY username;
```
Works fine for small tables, but doesn't scale. This needs a sequential scan on player and two index scans on match per existing player. Compare performance with EXPLAIN ANALYZE.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

Add up conditional counts on multiple columns of the same table

Query

Index

Shorter, but slow