PostgreSQL MAX and GROUP BY

后端 未结 2 608
一整个雨季
一整个雨季 2021-01-30 06:20

I have a table with id, year and count.

I want to get the MAX(count) for each id and keep the yea

相关标签:
2条回答
  • 2021-01-30 07:08
    select *
    from (
      select id, 
             year,
             thing,
             max(thing) over (partition by id) as max_thing
      from the_table
    ) t
    where thing = max_thing
    

    or:

    select t1.id,
           t1.year,
           t1.thing
    from the_table t1
    where t1.thing = (select max(t2.thing) 
                      from the_table t2
                      where t2.id = t1.id);
    

    or

    select t1.id,
           t1.year,
           t1.thing
    from the_table t1
      join ( 
        select id, max(t2.thing) as max_thing
        from the_table t2
        group by id
      ) t on t.id = t1.id and t.max_thing = t1.thing
    

    or (same as the previous with a different notation)

    with max_stuff as (
      select id, max(t2.thing) as max_thing
      from the_table t2
      group by id
    ) 
    select t1.id, 
           t1.year,
           t1.thing
    from the_table t1
      join max_stuff t2 
        on t1.id = t2.id 
       and t1.thing = t2.max_thing
    
    0 讨论(0)
  • 2021-01-30 07:26

    The shortest (and possibly fastest) query would be with DISTINCT ON, a PostgreSQL extension of the SQL standard DISTINCT clause:

    SELECT DISTINCT ON (1)
           id, count, year
    FROM   tbl
    ORDER  BY 1, 2 DESC, 3;
    

    The numbers refer to ordinal positions in the SELECT list. You can spell out column names for clarity:

    SELECT DISTINCT ON (id)
           id, count, year
    FROM   tbl
    ORDER  BY id, count DESC, year;
    

    The result is ordered by id etc. which may or may not be welcome. It's better than "undefined" in any case.

    It also breaks ties (when multiple years share the same maximum count) in a well defined way: pick the earliest year. If you don't care, drop year from the ORDER BY. Or pick the latest year with year DESC.

    More explanation, links, a benchmark and possibly faster solutions in this closely related answer:

    • Select first row in each GROUP BY group?

    Aside: In a real life query, you wouldn't use some of the column names. id is a non-descriptive anti-pattern for a column name, count is a reserved word in standard SQL and an aggregate function in Postgres.

    0 讨论(0)
提交回复
热议问题