Add up conditional counts on multiple columns of the same table

后端 未结 4 1207
清歌不尽
清歌不尽 2021-01-21 01:16

I am looking for a \"better\" way to perform a query in which I want to show a single player who he has played previously and the associated win-loss record for each such oppone

相关标签:
4条回答
  • 2021-01-21 01:49

    Solution with correlated subquery:

    SELECT *,
           (SELECT COUNT(*) FROM match WHERE loser_id = p.player_id),
           (SELECT COUNT(*) FROM match WHERE winner_id = p.player_id)
    FROM dbo.player p WHERE player_id <> 1
    

    Solution with UNION and conditional aggregation:

    SELECT  t.loser_id ,
            SUM(CASE WHEN result = 1 THEN 1 ELSE 0 END) ,
            SUM(CASE WHEN result = -1 THEN 1 ELSE 0 END)
    FROM    ( SELECT    * , 1 AS result
              FROM      match
              WHERE     winner_id = 1
              UNION ALL
              SELECT    loser_id , winner_id , -1 AS result
              FROM      match
              WHERE     loser_id = 1
            ) t
    GROUP BY t.loser_id
    
    0 讨论(0)
  • 2021-01-21 01:52

    For a single 'subject' player, I would simply union the player in both the winning and losing roles, and sum up the wins / losses:

    SELECT opponent, SUM(won) as won, SUM(lost) as lost
    FROM
    (
        select w.username AS opponent, 0 AS won, 1 as lost, m.loser_id as me
        from "match" m
         inner join "player" w on m.winner_id = w.player_id
    
        UNION ALL
    
        select l.username AS opponent, 1 AS won, 0 as lost, m.winner_id as me
        from "match" m
         inner join "player" l on m.loser_id = l.player_id
    ) x
    WHERE me = 1
    GROUP BY opponent;
    

    For a set based operation, we can just left join the players to the same derived union table:

    SELECT p.username as player, x.opponent, SUM(x.won) as won, SUM(x.lost) as lost
    FROM "player" p
    LEFT JOIN
    (
        select w.username AS opponent, 0 AS won, 1 as lost, m.loser_id as me
        from "match" m
         inner join "player" w on m.winner_id = w.player_id
    
        UNION ALL
    
        select l.username AS opponent, 1 AS won, 0 as lost, m.winner_id as me
        from "match" m
         inner join "player" l on m.loser_id = l.player_id
    ) x
    on p.player_id = x.me
    GROUP BY player, opponent;
    

    SqlFiddles of both here

    One small point - the names of the indices must be unique - presumably you meant:

    create index idx_winners on match(winner_id);
    create index idx_losers on match(loser_id);
    
    0 讨论(0)
  • 2021-01-21 02:00

    Query

    The query is not as simple as it looks at first. The shortest query string does not necessarily yield best performance. This should be as fast as it gets, being as short as possible for that:

    SELECT p.username, COALESCE(w.ct, 0) AS won, COALESCE(l.ct, 0) AS lost
    FROM  (
       SELECT loser_id AS player_id, count(*) AS ct
       FROM   match
       WHERE  winner_id = 1  -- your player_id here
       GROUP  BY 1           -- positional reference (not your player_id)
       ) w
    FULL JOIN (
       SELECT winner_id AS player_id, count(*) AS ct
       FROM   match
       WHERE  loser_id = 1   -- your player_id here
       GROUP  BY 1
       ) l USING (player_id)
    JOIN   player p USING (player_id)
    ORDER  BY 1;
    

    Result exactly as requested:

    username | won | lost
    ---------+-----+-----
    alice    | 3   | 2
    bob      | 1   | 0
    mary     | 2   | 1
    

    SQL Fiddle - with more revealing test data!

    The key feature is the FULL [OUTER] JOIN between the two subqueries for losses and wins. This produces a table of all players our candidate has played against. The USING clause in the join condition conveniently merges the two player_id columns into one.

    After that, a single JOIN to player to get the name, and COALESCE to replace NULL with 0. Voilá.

    Index

    Would be even faster with two multicolumn indexes:

    CREATE INDEX idx_winner on match (winner_id, loser_id);
    CREATE INDEX idx_loser  on match (loser_id, winner_id);
    

    Only if you get index-only scans out of this. Then Postgres does not even visit the match table at all and you get super-fast results.

    With two integer columns you happen to hit a local optimum: theses indexes have just the same size as the simple ones you had. Details:

    • Is a composite index also good for queries on the first field?

    Shorter, but slow

    You could run correlated subqueries like @Giorgi suggested, just working correctly:

    SELECT *
    FROM  (
       SELECT username
           , (SELECT count(*) FROM match
              WHERE  loser_id  = p.player_id
              AND    winner_id = 1) AS won
           , (SELECT count(*) FROM match
              WHERE  winner_id = p.player_id
              AND    loser_id  = 1) AS lost
       FROM   player p
       WHERE  player_id <> 1
       ) sub
    WHERE (won > 0 OR lost > 0)
    ORDER  BY username;
    

    Works fine for small tables, but doesn't scale. This needs a sequential scan on player and two index scans on match per existing player. Compare performance with EXPLAIN ANALYZE.

    0 讨论(0)
  • 2021-01-21 02:03

    Something more readable than my original. Thoughts?

    with W as (
        select loser_id as opponent_id,
        count(*) as n
        from match
        where winner_id = 1
        group by loser_id
    ),
    L as (
        select winner_id as opponent_id,
        count(*) as n
        from match
        where loser_id = 1
        group by winner_id
    )
    select player.username, coalesce(W.n, 0) as wins, coalesce(L.n, 0) as losses
    from player
    left join W on W.opponent_id = player.player_id
    left join L on L.opponent_id = player.player_id
    where player.player_id != 1;
    
                                     QUERY PLAN                                  
    -----------------------------------------------------------------------------
     Hash Left Join  (cost=73.78..108.58 rows=1224 width=48)
       Hash Cond: (player.player_id = l.opponent_id)
       CTE w
         ->  HashAggregate  (cost=36.81..36.83 rows=2 width=4)
               Group Key: match.loser_id
               ->  Seq Scan on match  (cost=0.00..36.75 rows=11 width=4)
                     Filter: (winner_id = 1)
       CTE l
         ->  HashAggregate  (cost=36.81..36.83 rows=2 width=4)
               Group Key: match_1.winner_id
               ->  Seq Scan on match match_1  (cost=0.00..36.75 rows=11 width=4)
                     Filter: (loser_id = 1)
       ->  Hash Left Join  (cost=0.07..30.15 rows=1224 width=44)
             Hash Cond: (player.player_id = w.opponent_id)
             ->  Seq Scan on player  (cost=0.00..25.38 rows=1224 width=36)
                   Filter: (player_id <> 1)
             ->  Hash  (cost=0.04..0.04 rows=2 width=12)
                   ->  CTE Scan on w  (cost=0.00..0.04 rows=2 width=12)
       ->  Hash  (cost=0.04..0.04 rows=2 width=12)
             ->  CTE Scan on l  (cost=0.00..0.04 rows=2 width=12)
    

    The above has a performance killer with the player_id != 1. I think I can avoid that by only scanning the results of the joins, no?

    explain with W as (
            select loser_id as opponent_id,
            count(*) as n
            from match
            where winner_id = 1 
            group by loser_id
        ),  
        L as (
            select winner_id as opponent_id,
            count(*) as n
            from match
            where loser_id = 1 
            group by winner_id
        )   
        select t.* from (
            select player.player_id, player.username, coalesce(W.n, 0) as wins, coalesce(L.n, 0) as losses
            from player
            left join W on W.opponent_id = player.player_id
            left join L on L.opponent_id = player.player_id
        ) t 
        where t.player_id != 1;
    
                                     QUERY PLAN                                  
    -----------------------------------------------------------------------------
     Hash Left Join  (cost=73.78..74.89 rows=3 width=52)
       Hash Cond: (player.player_id = l.opponent_id)
       CTE w
         ->  HashAggregate  (cost=36.81..36.83 rows=2 width=4)
               Group Key: match.loser_id
               ->  Seq Scan on match  (cost=0.00..36.75 rows=11 width=4)
                     Filter: (winner_id = 1)
       CTE l
         ->  HashAggregate  (cost=36.81..36.83 rows=2 width=4)
               Group Key: match_1.winner_id
               ->  Seq Scan on match match_1  (cost=0.00..36.75 rows=11 width=4)
                     Filter: (loser_id = 1)
       ->  Hash Left Join  (cost=0.07..1.15 rows=3 width=44)
             Hash Cond: (player.player_id = w.opponent_id)
             ->  Seq Scan on player  (cost=0.00..1.05 rows=3 width=36)
                   Filter: (player_id <> 1)
             ->  Hash  (cost=0.04..0.04 rows=2 width=12)
                   ->  CTE Scan on w  (cost=0.00..0.04 rows=2 width=12)
       ->  Hash  (cost=0.04..0.04 rows=2 width=12)
             ->  CTE Scan on l  (cost=0.00..0.04 rows=2 width=12)
    
    0 讨论(0)
提交回复
热议问题