Does “group by” automatically guarantee “order by”?

早过忘川 提交于 2019-12-21 07:39:07

问题


Does "group by" clause automatically guarantee that the results will be ordered by that key? In other words, is it enough to write:

select * 
from table
group by a, b, c

or does one have to write

select * 
from table
group by a, b, c
order by a, b, c

I know e.g. in MySQL I don't have to, but I would like to know if I can rely on it accross the SQL implementations. Is it guaranteed?


回答1:


group by does not order the data neccessarily. A DB is designed to grab the data as fast as possible and only sort if necessary.

So add the order by if you need a guaranteed order.




回答2:


An efficient implementation of group by would perform the group-ing by sorting the data internally. That's why some RDBMS return sorted output when group-ing. Yet, the SQL specs don't mandate that behavior, so unless explicitly documented by the RDBMS vendor I wouldn't bet on it to work (tomorrow). OTOH, if the RDBMS implicitly does a sort it might also be smart enough to then optimize (away) the redundant order by. @jimmyb

An example using PostgreSQL proving that concept

Creating a table with 1M records, with random dates in a day range from today - 90 and indexing by date

CREATE TABLE WITHDRAW AS
  SELECT (random()*1000000)::integer AS IDT_WITHDRAW,
    md5(random()::text) AS NAM_PERSON,
    (NOW() - ( random() * (NOW() + '90 days' - NOW()) ))::timestamp AS DAT_CREATION, -- de hoje a 90 dias atras
    (random() * 1000)::decimal(12, 2) AS NUM_VALUE
  FROM generate_series(1,1000000);

CREATE INDEX WITHDRAW_DAT_CREATION ON WITHDRAW(DAT_CREATION);

Grouping by date truncated by day of month, restricting select by dates in a two days range

EXPLAIN 
SELECT
    DATE_TRUNC('DAY', W.dat_creation), COUNT(1), SUM(W.NUM_VALUE)
FROM WITHDRAW W
WHERE W.dat_creation >= (NOW() - INTERVAL '2 DAY')::timestamp
AND W.dat_creation < (NOW() - INTERVAL '1 DAY')::timestamp
GROUP BY 1

HashAggregate  (cost=11428.33..11594.13 rows=11053 width=48)
  Group Key: date_trunc('DAY'::text, dat_creation)
  ->  Bitmap Heap Scan on withdraw w  (cost=237.73..11345.44 rows=11053 width=14)
        Recheck Cond: ((dat_creation >= ((now() - '2 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
        ->  Bitmap Index Scan on withdraw_dat_creation  (cost=0.00..234.97 rows=11053 width=0)
              Index Cond: ((dat_creation >= ((now() - '2 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))

Using a larger restriction date range, it chooses to apply a SORT

EXPLAIN 
SELECT
    DATE_TRUNC('DAY', W.dat_creation), COUNT(1), SUM(W.NUM_VALUE)
FROM WITHDRAW W
WHERE W.dat_creation >= (NOW() - INTERVAL '60 DAY')::timestamp
AND W.dat_creation < (NOW() - INTERVAL '1 DAY')::timestamp
GROUP BY 1

GroupAggregate  (cost=116522.65..132918.32 rows=655827 width=48)
  Group Key: (date_trunc('DAY'::text, dat_creation))
  ->  Sort  (cost=116522.65..118162.22 rows=655827 width=14)
        Sort Key: (date_trunc('DAY'::text, dat_creation))
        ->  Seq Scan on withdraw w  (cost=0.00..41949.57 rows=655827 width=14)
              Filter: ((dat_creation >= ((now() - '60 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))

Just by adding ORDER BY 1 at the end (there is no significant difference)

GroupAggregate  (cost=116522.44..132918.06 rows=655825 width=48)
  Group Key: (date_trunc('DAY'::text, dat_creation))
  ->  Sort  (cost=116522.44..118162.00 rows=655825 width=14)
        Sort Key: (date_trunc('DAY'::text, dat_creation))
        ->  Seq Scan on withdraw w  (cost=0.00..41949.56 rows=655825 width=14)
              Filter: ((dat_creation >= ((now() - '60 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))

PostgreSQL 10.3




回答3:


It definitely doesn't. I have experienced that, once one of my queries suddenly started to return not-ordered results, as the data in the table grows by.




回答4:


I tried it. Adventureworks db of Msdn.

select HireDate, min(JobTitle)
from AdventureWorks2016CTP3.HumanResources.Employee
group by HireDate

Resuts :

2009-01-10Production Technician - WC40

2009-01-11Application Specialist

2009-01-12Assistant to the Chief Financial Officer

2009-01-13Production Technician - WC50<

It returns sorted data of hiredate, but you don't rely on GROUP BY to SORT under any circumstances.

for example; indexes can change this sorted data.

I added following index (hiredate, jobtitle)

CREATE NONCLUSTERED INDEX NonClusturedIndex_Jobtitle_hireddate ON [HumanResources].[Employee]
(
    [JobTitle] ASC,
    [HireDate] ASC
)

Result will change with same select query;

2006-06-30 Production Technician - WC60

2007-01-26 Marketing Assistant

2007-11-11 Engineering Manager

2007-12-05 Senior Tool Designer

2007-12-11 Tool Designer

2007-12-20 Marketing Manager

2007-12-26 Production Supervisor - WC60

You can download Adventureworks2016 at the following address

https://www.microsoft.com/en-us/download/details.aspx?id=49502




回答5:


It depends on the number of records. When the records are less, Group by sorted automatically. When the records are more(more than 15) it required adding Order by clause



来源:https://stackoverflow.com/questions/28149876/does-group-by-automatically-guarantee-order-by

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!