问题
I need to group data by id and find max/min (date_from, date_to). But if there is date gap it should be new row.
I have following data:
SYS_ID ITEM_ID DATE_FROM DATE_TO
1 1 01.01.2019 20.01.2019
1 1 15.01.2019 10.02.2019
1 1 15.02.2019 20.02.2019
1 1 18.02.2019 10.03.2019
1 1 10.03.2019 22.03.2019
1 2 01.01.2019 10.01.2019
1 2 15.01.2019 25.01.2019
Result should be:
SYS_ID ITEM_ID DATE_FROM DATE_TO
1 1 01.01.2019 10.02.2019
1 1 15.02.2019 22.03.2019
1 2 01.01.2019 10.01.2019
1 2 15.01.2019 25.01.2019
Is there a way to do this without using cursor?
回答1:
Use gaps and islands approach
Live test: http://sqlfiddle.com/#!18/0174b/3
with gap_detector as
(
select
sys_id, item_id,
date_from, date_to,
case when
lag(date_to)
over(partition by sys_id, item_id order by date_from) >= date_from
then
0
else
1
end as gap
from tbl
)
, grouper as
(
select
sys_id, item_id,
date_from, date_to,
sum(gap) over(partition by sys_id, item_id order by date_from) as grp
from gap_detector
)
select
sys_id, item_id,
min(date_from) as date_from,
max(date_to) as date_to
from grouper
group by sys_id, item_id, grp
Output:
| sys_id | item_id | date_from | date_to |
|--------|---------|------------|------------|
| 1 | 1 | 2019-01-01 | 2019-02-10 |
| 1 | 1 | 2019-02-15 | 2019-03-22 |
| 1 | 2 | 2019-01-01 | 2019-01-10 |
| 1 | 2 | 2019-01-15 | 2019-01-25 |
How it works
First we need to detect if date_to from previous row (using lag
) overlaps with current date_from.
Note that we have independent sets of date_from, that is, the previous row of sys_id
+ item_id
combo (e.g., 1,1) does not overlap with another sys_id
+ item_id
combo (e.g., 1,2). So the first previous date_to of 1,2
is not March 22, 2019
, it's NULL
instead. We can properly identify the previous row of each sys_id
+ item_id
combo by partitioning them, i.e., partition by sys_id, item_id
.
With that said here is how we can identify if date_to from previous row overlaps with current date_from:
- If the current date_from overlaps with previous date_to, don't isolate the current date_from from the previous row, we can do this by giving the current row a value of 0.
- Otherwise, if the current date_from does not overlap with previous date_to, isolate (in other word
gap
) the current row from previous row, by marking the current row as a gap, we can do this by giving it a value of 1. It will come later why we need 1 and 0.
Live test: http://sqlfiddle.com/#!18/0174b/7
with gap_detector as
(
select
sys_id, item_id,
date_from, date_to,
case when
lag(date_to)
over(partition by sys_id, item_id order by date_from) >= date_from
then
0
else
1
end as gap
from tbl
)
select *
from gap_detector
order by sys_id, item_id, date_from
Output:
| sys_id | item_id | date_from | date_to | gap |
|--------|---------|------------|------------|-----|
| 1 | 1 | 2019-01-01 | 2019-01-20 | 1 |
| 1 | 1 | 2019-01-15 | 2019-02-10 | 0 |
| 1 | 1 | 2019-02-15 | 2019-02-20 | 1 |
| 1 | 1 | 2019-02-18 | 2019-03-10 | 0 |
| 1 | 1 | 2019-03-10 | 2019-03-22 | 0 |
| 1 | 2 | 2019-01-01 | 2019-01-10 | 1 |
| 1 | 2 | 2019-01-15 | 2019-01-25 | 1 |
Next step is to group the the islands that belong to each other by doing a running total over the gap markers (1 and 0). Running total is done by doing a sum(gap)
over the window of sys_id
+ item_id
combo.
Each window of sys_id
+ item_id
combo can be operated on independently by doing a partition
on them, i.e., partition by sys_id, item_id
Live test: http://sqlfiddle.com/#!18/0174b/12
with gap_detector as
(
select
sys_id, item_id,
date_from, date_to,
case when
lag(date_to)
over(partition by sys_id, item_id order by date_from) >= date_from
then
0
else
1
end as gap
from tbl
)
, grouper as
(
select
sys_id, item_id,
date_from, date_to,
gap,
sum(gap) over(partition by sys_id, item_id order by date_from) as grp
from gap_detector
)
select sys_id, item_id, date_from, date_to, gap, grp
from grouper
Output:
| sys_id | item_id | date_from | date_to | gap | grp |
|--------|---------|------------|------------|-----|-----|
| 1 | 1 | 2019-01-01 | 2019-01-20 | 1 | 1 |
| 1 | 1 | 2019-01-15 | 2019-02-10 | 0 | 1 |
| 1 | 1 | 2019-02-15 | 2019-02-20 | 1 | 2 |
| 1 | 1 | 2019-02-18 | 2019-03-10 | 0 | 2 |
| 1 | 1 | 2019-03-10 | 2019-03-22 | 0 | 2 |
| 1 | 2 | 2019-01-01 | 2019-01-10 | 1 | 1 |
| 1 | 2 | 2019-01-15 | 2019-01-25 | 1 | 2 |
Finally, now that we are able to identify which islands belong to each other (denoted by grp
), it's just a matter of doing a group by
on those grp
markers to identify when date_from and date_to started on each group (grp
) of islands.
Live test: http://sqlfiddle.com/#!18/0174b/13
select
sys_id, item_id,
min(date_from) as date_from,
max(date_to) as date_to
from grouper
group by sys_id, item_id, grp
Output:
| sys_id | item_id | date_from | date_to |
|--------|---------|------------|------------|
| 1 | 1 | 2019-01-01 | 2019-02-10 |
| 1 | 1 | 2019-02-15 | 2019-03-22 |
| 1 | 2 | 2019-01-01 | 2019-01-10 |
| 1 | 2 | 2019-01-15 | 2019-01-25 |
来源:https://stackoverflow.com/questions/55906748/group-rows-by-id-and-find-max-mindate-from-date-to-with-date-gaps