Does “group by” automatically guarantee “order by”?

后端 未结 6 1008
予麋鹿
予麋鹿 2021-02-12 16:51

Does \"group by\" clause automatically guarantee that the results will be ordered by that key? In other words, is it enough to write:

select * 
from table
group          


        
6条回答
  •  孤街浪徒
    2021-02-12 17:11

    An efficient implementation of group by would perform the group-ing by sorting the data internally. That's why some RDBMS return sorted output when group-ing. Yet, the SQL specs don't mandate that behavior, so unless explicitly documented by the RDBMS vendor I wouldn't bet on it to work (tomorrow). OTOH, if the RDBMS implicitly does a sort it might also be smart enough to then optimize (away) the redundant order by. @jimmyb

    An example using PostgreSQL proving that concept

    Creating a table with 1M records, with random dates in a day range from today - 90 and indexing by date

    CREATE TABLE WITHDRAW AS
      SELECT (random()*1000000)::integer AS IDT_WITHDRAW,
        md5(random()::text) AS NAM_PERSON,
        (NOW() - ( random() * (NOW() + '90 days' - NOW()) ))::timestamp AS DAT_CREATION, -- de hoje a 90 dias atras
        (random() * 1000)::decimal(12, 2) AS NUM_VALUE
      FROM generate_series(1,1000000);
    
    CREATE INDEX WITHDRAW_DAT_CREATION ON WITHDRAW(DAT_CREATION);
    

    Grouping by date truncated by day of month, restricting select by dates in a two days range

    EXPLAIN 
    SELECT
        DATE_TRUNC('DAY', W.dat_creation), COUNT(1), SUM(W.NUM_VALUE)
    FROM WITHDRAW W
    WHERE W.dat_creation >= (NOW() - INTERVAL '2 DAY')::timestamp
    AND W.dat_creation < (NOW() - INTERVAL '1 DAY')::timestamp
    GROUP BY 1
    
    HashAggregate  (cost=11428.33..11594.13 rows=11053 width=48)
      Group Key: date_trunc('DAY'::text, dat_creation)
      ->  Bitmap Heap Scan on withdraw w  (cost=237.73..11345.44 rows=11053 width=14)
            Recheck Cond: ((dat_creation >= ((now() - '2 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
            ->  Bitmap Index Scan on withdraw_dat_creation  (cost=0.00..234.97 rows=11053 width=0)
                  Index Cond: ((dat_creation >= ((now() - '2 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
    

    Using a larger restriction date range, it chooses to apply a SORT

    EXPLAIN 
    SELECT
        DATE_TRUNC('DAY', W.dat_creation), COUNT(1), SUM(W.NUM_VALUE)
    FROM WITHDRAW W
    WHERE W.dat_creation >= (NOW() - INTERVAL '60 DAY')::timestamp
    AND W.dat_creation < (NOW() - INTERVAL '1 DAY')::timestamp
    GROUP BY 1
    
    GroupAggregate  (cost=116522.65..132918.32 rows=655827 width=48)
      Group Key: (date_trunc('DAY'::text, dat_creation))
      ->  Sort  (cost=116522.65..118162.22 rows=655827 width=14)
            Sort Key: (date_trunc('DAY'::text, dat_creation))
            ->  Seq Scan on withdraw w  (cost=0.00..41949.57 rows=655827 width=14)
                  Filter: ((dat_creation >= ((now() - '60 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
    

    Just by adding ORDER BY 1 at the end (there is no significant difference)

    GroupAggregate  (cost=116522.44..132918.06 rows=655825 width=48)
      Group Key: (date_trunc('DAY'::text, dat_creation))
      ->  Sort  (cost=116522.44..118162.00 rows=655825 width=14)
            Sort Key: (date_trunc('DAY'::text, dat_creation))
            ->  Seq Scan on withdraw w  (cost=0.00..41949.56 rows=655825 width=14)
                  Filter: ((dat_creation >= ((now() - '60 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
    

    PostgreSQL 10.3

提交回复
热议问题