Does “group by” automatically guarantee “order by”?

后端 未结 6 1006
予麋鹿
予麋鹿 2021-02-12 16:51

Does \"group by\" clause automatically guarantee that the results will be ordered by that key? In other words, is it enough to write:

select * 
from table
group          


        
相关标签:
6条回答
  • 2021-02-12 17:11

    An efficient implementation of group by would perform the group-ing by sorting the data internally. That's why some RDBMS return sorted output when group-ing. Yet, the SQL specs don't mandate that behavior, so unless explicitly documented by the RDBMS vendor I wouldn't bet on it to work (tomorrow). OTOH, if the RDBMS implicitly does a sort it might also be smart enough to then optimize (away) the redundant order by. @jimmyb

    An example using PostgreSQL proving that concept

    Creating a table with 1M records, with random dates in a day range from today - 90 and indexing by date

    CREATE TABLE WITHDRAW AS
      SELECT (random()*1000000)::integer AS IDT_WITHDRAW,
        md5(random()::text) AS NAM_PERSON,
        (NOW() - ( random() * (NOW() + '90 days' - NOW()) ))::timestamp AS DAT_CREATION, -- de hoje a 90 dias atras
        (random() * 1000)::decimal(12, 2) AS NUM_VALUE
      FROM generate_series(1,1000000);
    
    CREATE INDEX WITHDRAW_DAT_CREATION ON WITHDRAW(DAT_CREATION);
    

    Grouping by date truncated by day of month, restricting select by dates in a two days range

    EXPLAIN 
    SELECT
        DATE_TRUNC('DAY', W.dat_creation), COUNT(1), SUM(W.NUM_VALUE)
    FROM WITHDRAW W
    WHERE W.dat_creation >= (NOW() - INTERVAL '2 DAY')::timestamp
    AND W.dat_creation < (NOW() - INTERVAL '1 DAY')::timestamp
    GROUP BY 1
    
    HashAggregate  (cost=11428.33..11594.13 rows=11053 width=48)
      Group Key: date_trunc('DAY'::text, dat_creation)
      ->  Bitmap Heap Scan on withdraw w  (cost=237.73..11345.44 rows=11053 width=14)
            Recheck Cond: ((dat_creation >= ((now() - '2 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
            ->  Bitmap Index Scan on withdraw_dat_creation  (cost=0.00..234.97 rows=11053 width=0)
                  Index Cond: ((dat_creation >= ((now() - '2 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
    

    Using a larger restriction date range, it chooses to apply a SORT

    EXPLAIN 
    SELECT
        DATE_TRUNC('DAY', W.dat_creation), COUNT(1), SUM(W.NUM_VALUE)
    FROM WITHDRAW W
    WHERE W.dat_creation >= (NOW() - INTERVAL '60 DAY')::timestamp
    AND W.dat_creation < (NOW() - INTERVAL '1 DAY')::timestamp
    GROUP BY 1
    
    GroupAggregate  (cost=116522.65..132918.32 rows=655827 width=48)
      Group Key: (date_trunc('DAY'::text, dat_creation))
      ->  Sort  (cost=116522.65..118162.22 rows=655827 width=14)
            Sort Key: (date_trunc('DAY'::text, dat_creation))
            ->  Seq Scan on withdraw w  (cost=0.00..41949.57 rows=655827 width=14)
                  Filter: ((dat_creation >= ((now() - '60 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
    

    Just by adding ORDER BY 1 at the end (there is no significant difference)

    GroupAggregate  (cost=116522.44..132918.06 rows=655825 width=48)
      Group Key: (date_trunc('DAY'::text, dat_creation))
      ->  Sort  (cost=116522.44..118162.00 rows=655825 width=14)
            Sort Key: (date_trunc('DAY'::text, dat_creation))
            ->  Seq Scan on withdraw w  (cost=0.00..41949.56 rows=655825 width=14)
                  Filter: ((dat_creation >= ((now() - '60 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
    

    PostgreSQL 10.3

    0 讨论(0)
  • 2021-02-12 17:12

    It definitely doesn't. I have experienced that, once one of my queries suddenly started to return not-ordered results, as the data in the table grows by.

    0 讨论(0)
  • 2021-02-12 17:20

    It depends on the database vendor.

    For example PostgreSQL does not automatically sort the grouped result. Here you have to use order by to get the data sorted.

    But Sybase and Microsoft SQL Server do. Here you can use order by to change the default sorting.

    0 讨论(0)
  • 2021-02-12 17:26

    I tried it. Adventureworks db of Msdn.

    select HireDate, min(JobTitle)
    from AdventureWorks2016CTP3.HumanResources.Employee
    group by HireDate
    

    Resuts :

    2009-01-10Production Technician - WC40

    2009-01-11Application Specialist

    2009-01-12Assistant to the Chief Financial Officer

    2009-01-13Production Technician - WC50<

    It returns sorted data of hiredate, but you don't rely on GROUP BY to SORT under any circumstances.

    for example; indexes can change this sorted data.

    I added following index (hiredate, jobtitle)

    CREATE NONCLUSTERED INDEX NonClusturedIndex_Jobtitle_hireddate ON [HumanResources].[Employee]
    (
        [JobTitle] ASC,
        [HireDate] ASC
    )
    

    Result will change with same select query;

    2006-06-30 Production Technician - WC60

    2007-01-26 Marketing Assistant

    2007-11-11 Engineering Manager

    2007-12-05 Senior Tool Designer

    2007-12-11 Tool Designer

    2007-12-20 Marketing Manager

    2007-12-26 Production Supervisor - WC60

    You can download Adventureworks2016 at the following address

    https://www.microsoft.com/en-us/download/details.aspx?id=49502

    0 讨论(0)
  • 2021-02-12 17:33

    group by does not order the data neccessarily. A DB is designed to grab the data as fast as possible and only sort if necessary.

    So add the order by if you need a guaranteed order.

    0 讨论(0)
  • 2021-02-12 17:33

    It depends on the number of records. When the records are less, Group by sorted automatically. When the records are more(more than 15) it required adding Order by clause

    0 讨论(0)
提交回复
热议问题