问题
Trying to build a query.
The input is ordered by rownumber in column 'rn' starting with 1 for each unique value in 'name' and defining a given sequence of entries in 'act'. In column 'act' it holds two values in multiple occurence, >sleep< and >wake<. The goal is to find for each consecutive set of rows of one of those values the minimum and maximum value of startt and endd.
This shall be the input:
name act rn startt endd
---------- ---------- ------ ------ ------
jimmy sleep 1 1 3
jimmy wake 2 4 7
jimmy wake 3 8 10
jimmy sleep 4 11 13
karen wake 1 1 4
karen sleep 2 5 7
karen wake 3 8 9
karen wake 4 10 12
karen wake 5 13 14
karen sleep 6 15 17
karen sleep 7 18 20
the desired output:
name act startt endd
---------- ---------- ------ ------
jimmy sleep 1 3
jimmy wake 4 10
jimmy sleep 11 13
karen wake 1 4
karen sleep 5 7
karen wake 8 14
karen sleep 15 20
The source of the input does not provide further columns. The number of members in each subset can be very much higher then in this example.
I tried different ways of aggregating, but none worked. I believe using LEAD
and LAGG
and further trickery might get me there, but that appears to be awfully unelegant. I have the notion it is key to differentiate each subset, i.e. create an identifier unique to all its members. With this at hand an aggregate with min
and max
is simple. Maybe i'm wrong. Maybe it's impossible. Maybe a self join. Maybe a recursive cte. I don't know.
So: does anybody know how to get this? Help is much appreciated.
UPDATE:
Thank You to Gordon Linoff, shawnt00 and the other contributors who commented. With Your advice I feel major gaps in my toolbox of logic closing.
For the interested:
declare @t table (
name nvarchar(10)
,act nvarchar (10)
,startt smallint
,endd smallint
)
insert into @t (
name
,act
,startt
,endd
)
values
('jimmy','sleep', 1,3)
,('jimmy','wake', 4,7)
,('jimmy','wake', 8,10)
,('jimmy','sleep', 11,13)
,('karen','wake', 1,4)
,('karen','sleep', 5,7)
,('karen','wake', 8,9)
,('karen','wake', 10,12)
,('karen','wake', 13,14)
,('karen','sleep', 15,17)
,('karen','sleep', 18,20)
; --- all rows, no aggregating
with
cte as (
select
name
,act
,row_number() over (partition by name order by name,startt) rn
,row_number() over (partition by name, act order by name,startt) act_n
,startt
,endd
from
@t )
select
name
,act
,startt
,endd
,rn
,act_n
,rn - act_n diff
from
cte
order by
name
,rn
;--- aggregating for the desired ouput
with
cte as (
select
name
,act
,row_number() over (partition by name order by name,startt) rn
,row_number() over (partition by name, act order by name,startt) act_n
,startt
,endd
from
@t )
select
name
,act
,min(startt) startt
,max(endd) endd
,min(rn) min_rn
,max(rn) max_rn
from
cte
group by
name
,act
,rn - act_n
order by
name
,min(rn)
回答1:
You want to find consecutive groups of similar rows and then aggregation. I like the difference of row numbers approach:
select name, act, min(startt) as startt, max(endd) as endd
from (select i.*,
row_number() over (partition by name, act order by rn) as seqnum_na,
row_number() over (partition by name order by rn) as seqnum_n
from input i
) i
group by (seqnum_n - seqnum_na), name, act;
You can see how this works by looking at what the subquery does.
回答2:
This assumes that you don't have any gaps in the rn
numbering so it doesn't calculate it again.
with T2 as (
select *, row_number() over (partition by name, act order by rn) as grp_rn
from T
)
select name, act, min(startt) as startt, max(endd) as endd
from T2
group by name, act, rn - grp_rn
order by name, startt;
http://rextester.com/CCQJJ93990
This is a typical gaps and islands query. The key here is that when you have a cluster of rows the two different numberings will increase in step and that means the difference will be a constant for the cluster. This difference increases as you work your way down the list.
来源:https://stackoverflow.com/questions/38151616/find-min-and-max-for-subsets-of-consecutive-rows-gaps-and-islands