问题
I have a table with these values;
user_id ts val
uid1 19.05.2019 01:49:50 0
uid1 19.05.2019 01:50:15 0
uid1 19.05.2019 01:50:20 0
uid1 19.05.2019 01:59:50 1
uid1 19.05.2019 02:20:10 1
uid1 19.05.2019 02:20:15 0
uid1 19.05.2019 02:20:19 0
uid1 19.05.2019 02:30:53 1
uid1 19.05.2019 11:10:25 1
uid1 19.05.2019 11:13:40 0
uid1 19.05.2019 11:13:50 0
uid1 19.05.2019 11:20:19 1
uid2 19.05.2019 15:01:44 0
uid2 19.05.2019 15:05:55 0
uid2 19.05.2019 17:19:35 1
uid2 19.05.2019 17:20:01 0
uid2 19.05.2019 17:20:35 0
uid2 19.05.2019 19:15:50 1
When I query this table with only partition by clause, result seems like this;
Query : select *, sum(val) over (partition by user_id) as res from example_table;
user_id ts val res
uid1 19.05.2019 01:49:50 0 5
uid1 19.05.2019 01:50:15 0 5
uid1 19.05.2019 01:50:20 0 5
uid1 19.05.2019 01:59:50 1 5
uid1 19.05.2019 02:20:10 1 5
uid1 19.05.2019 02:20:15 0 5
uid1 19.05.2019 02:20:19 0 5
uid1 19.05.2019 02:30:53 1 5
uid1 19.05.2019 11:10:25 1 5
uid1 19.05.2019 11:13:40 0 5
uid1 19.05.2019 11:13:50 0 5
uid1 19.05.2019 11:20:19 1 5
uid2 19.05.2019 15:01:44 0 2
uid2 19.05.2019 15:05:55 0 2
uid2 19.05.2019 17:19:35 1 2
uid2 19.05.2019 17:20:01 0 2
uid2 19.05.2019 17:20:35 0 2
uid2 19.05.2019 19:15:50 1 2
In the above results, res column has total sum value of the val column for each partition. But, If I'll query table with partition by and order by, I'm getting these results;
Query: select *, sum(val) over (partition by user_id order by ts) as res from example_table;
user_id ts val res
uid1 19.05.2019 01:49:50 0 0
uid1 19.05.2019 01:50:15 0 0
uid1 19.05.2019 01:50:20 0 0
uid1 19.05.2019 01:59:50 1 1
uid1 19.05.2019 02:20:10 1 2
uid1 19.05.2019 02:20:15 0 2
uid1 19.05.2019 02:20:19 0 2
uid1 19.05.2019 02:30:53 1 3
uid1 19.05.2019 11:10:25 1 4
uid1 19.05.2019 11:13:40 0 4
uid1 19.05.2019 11:13:50 0 4
uid1 19.05.2019 11:20:19 1 5
uid2 19.05.2019 15:01:44 0 0
uid2 19.05.2019 15:05:55 0 0
uid2 19.05.2019 17:19:35 1 1
uid2 19.05.2019 17:20:01 0 1
uid2 19.05.2019 17:20:35 0 1
uid2 19.05.2019 19:15:50 1 2
But with order by clause, res column has the cumulative sum of the value column for each row for each partition.
Whyy? I can't understand this.
回答1:
Update
This behavior is documented here:
4.2.8. Window Function Calls
[..] The default framing option is
RANGE UNBOUNDED PRECEDING
, which is the same asRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. WithORDER BY
, this sets the frame to be all rows from the partition start up through the current row's lastORDER BY
peer. WithoutORDER BY
, this means all rows of the partition are included in the window frame, since all rows become peers of the current row.
That means:
In absence of a frame_clause – RANGE UNBOUNDED PRECEDING
is used by default. That includes:
- All rows "preceding" the current row according to the
ORDER BY
clause - The current row
- All rows which have the same values in the
ORDER BY
columns as the current row
In absence of an ORDER BY
clause – ORDER BY NULL
is assumed (though I'm guessing again). Thus the frame will include all rows from the partition, because the values in the ORDER BY
column(s) are the same (which is always NULL
) in every row.
Original answer:
Disclaimer: The following is more a guess than a qualified answer. I didn't find any documentation, which can confirm what I write. At the same time I don't think that currently given answers correctly explain the behavior.
The reason for the diffrence in the results is not directly the ORDER BY clause, since a + b + c
is the same as c + b + a
. The reason is (and that is my guess) that the ORDER BY clause implicitly defines the frame_clause as
rows between unbounded preceding and current row
Try the following query:
select *
, sum(val) over (partition by user_id) as res
, sum(val) over (partition by user_id order by ts) as res_order_by
, sum(val) over (
partition by user_id
order by ts
rows between unbounded preceding and current row
) as res_order_by_unbounded_preceding
, sum(val) over (
partition by user_id
-- order by ts
rows between unbounded preceding and current row
) as res_preceding
, sum(val) over (
partition by user_id
-- order by ts
rows between current row and unbounded following
) as res_following
, sum(val) over (
partition by user_id
order by ts
rows between unbounded preceding and unbounded following
) as res_orderby_preceding_following
from example_table;
db<>fiddle
You will see, that you can get a cumulative sum without an ORDER BY clause aswell as get a "full" sum with the ORDER BY clause.
回答2:
That is how order by
works with window functions.
When it is not present, then the function acts like an aggregation function over the window frame definition. That is, it returns the same value for everything in the window frame.
When it is present, then the function acts in a cumulative fashion, with the result "up to" the current row.
Of course, this is also influenced by the window frame specification. However, your example queries do not include rows
or range
as well as order by
.
回答3:
From 3.5. Window Functions:
...You can also control the order in which rows are processed by window functions using ORDER BY within OVER..
This is the difference of over (partition by user_id)
in which there is no order for processing the rows inside each group that they are divided and over (partition by user_id order by ts)
which processes the rows after sorting them by ts
.
This means that for each row a new sum(val)
is calculated based on and up to the position of the row in the sorted rows.
Maybe it's easier to understand this for the case of rank()
window function, so visit the link at the beginning of this answer where there is a very good example and more about this topic.
回答4:
Let's create one simple example to understand it properly.
We have considered one bank table with daily credit and debit.
The following query will calculate the daily balance and also total balance for a customer(partition by
is used to divide the results for individual customers) as column names suggest with use of SUM
analytical function with and without ORDER BY
clause:
SQL> WITH BANK_TABLE (CUST_ID, DT, AMOUNT_CR_DR)
2 AS
3 (
4 SELECT 1, DATE '2019-01-01', 1000 FROM DUAL UNION ALL
5 SELECT 1, DATE '2019-01-02', 2000 FROM DUAL UNION ALL
6 SELECT 1, DATE '2019-01-03', -1000 FROM DUAL UNION ALL
7 SELECT 1, DATE '2019-01-04', -500 FROM DUAL UNION ALL
8 SELECT 1, DATE '2019-01-05', 2000 FROM DUAL
9 )
10 SELECT DT, AMOUNT_CR_DR,
11 SUM(AMOUNT_CR_DR) OVER (PARTITION BY CUST_ID) AS TOTAL_BALANCE_LIFE_TIME,
12 SUM(AMOUNT_CR_DR) OVER (PARTITION BY CUST_ID ORDER BY DT) AS TOTAL_BALANCE_TILL_DATE
13 FROM BANK_TABLE
14 ORDER BY CUST_ID, DT;
DT AMOUNT_CR_DR TOTAL_BALANCE_LIFE_TIME TOTAL_BALANCE_TILL_DATE
--------- ------------ ----------------------- -----------------------
01-JAN-19 1000 3500 1000
02-JAN-19 2000 3500 3000
03-JAN-19 -1000 3500 2000
04-JAN-19 -500 3500 1500
05-JAN-19 2000 3500 3500
Partition by
clause is used to divide rows in group and order by
clause is to calculate the value in that order.
So for rows in order,
For 1st row, sum will be returned for 1st row only.
For 2nd row, sum will be first row plus second row.
Same way till the last row of the partition.
Cheers!!
来源:https://stackoverflow.com/questions/57639840/partition-by-with-order-by-clause-in-postgresql