问题
I have got in the DB data like you can see below (additional info about dates: date in valid_from is included, date in valid_to is excluded)
obj_number | obj_related | valid_from | valid_to |
---|---|---|---|
AA | BB | 01.01.2018 | 01.01.2019 |
AA | BB | 01.01.2019 | 31.03.2019 |
AA | BB | 31.03.2019 | |
AA | CC | 01.01.2020 | 30.06.2020 |
AA | CC | 02.07.2020 | 31.10.2020 |
AA | CC | 31.10.2020 | 31.12.2020 |
AA | DD | 01.01.2018 | 30.11.2020 |
AA | DD | 30.11.2020 | 31.12.2020 |
I have to merge the data, but in a special way. It should be merged around obj_related to show the minimum valid_from and maximum valid_from/null. But if there is a GAP in dates like you can see for CC (row 4 and 5) then both records should be in the result. The best way to understand it when I show you the correct result:
obj_number | obj_related | valid_from | valid_to |
---|---|---|---|
AA | BB | 01.01.2018 | |
AA | CC | 01.01.2020 | 30.06.2020 |
AA | CC | 02.07.2020 | 31.12.2020 |
AA | DD | 01.01.2018 | 31.12.2020 |
Oracle version: 12.1.0.2
Could you help me to prepare an SQL query
回答1:
Here is a different solution, which should work in Oracle 10.1 and higher - using the Tabibitosan method. The problem is slightly complicated by the use of NULL
to mark an indefinite valid_to
date; in particular, the definition of valid_to
in the outer query can't simply be max(valid_to)
within each group, since that would produce the wrong answer when valid_to
may be null
.
Other than that, the computation that produces the grp
column in the subquery is the main idea: it produces a different date for each "island" in the "gaps and islands" structure of the input data. This is a less known use of the Tabibitosan method; it makes this kind of query as efficient as possible since it requires only one level of analytic functions.
/*
with
sample_data (...) as (...)
*/
select obj_number, obj_related, min(valid_from) as valid_from,
max(valid_to) keep (dense_rank last order by valid_from) as valid_to
from (
select sd.*,
nvl(valid_to, date '9999-12-31') -
sum(nvl(valid_to, date '9999-12-31') - valid_from)
over (partition by obj_number, obj_related
order by valid_from) as grp
from sample_data sd
)
group by obj_number, obj_related, grp
order by obj_number, obj_related, valid_from
;
The best way to try to understand how the Tabibitosan method works (in this case) is to run the subquery separately and to see what it produces.
回答2:
In Oracle 12.1 or higher, you can solve this easily with match_recognize
:
alter session set nls_date_format='dd.mm.yyyy';
with
sample_data (obj_number, obj_related, valid_from, valid_to) as (
select 'AA', 'BB', to_date('01.01.2018'), to_date('01.01.2019') from dual union all
select 'AA', 'BB', to_date('01.01.2019'), to_date('31.03.2019') from dual union all
select 'AA', 'BB', to_date('31.03.2019'), null from dual union all
select 'AA', 'CC', to_date('01.01.2020'), to_date('30.06.2020') from dual union all
select 'AA', 'CC', to_date('02.07.2020'), to_date('31.10.2020') from dual union all
select 'AA', 'CC', to_date('31.10.2020'), to_date('31.12.2020') from dual union all
select 'AA', 'DD', to_date('01.01.2018'), to_date('30.11.2020') from dual union all
select 'AA', 'DD', to_date('30.11.2020'), to_date('31.12.2020') from dual
)
select *
from sample_data
match_recognize(
partition by obj_number, obj_related
order by valid_from
measures first(valid_from) as valid_from, last(valid_to) as valid_to
pattern ( a b* )
define b as valid_from = prev (valid_to)
);
OBJ_NUMBER OBJ_RELATED VALID_FROM VALID_TO
---------- ------------ ---------- ----------
AA BB 01.01.2018
AA CC 01.01.2020 30.06.2020
AA CC 02.07.2020 31.12.2020
AA DD 01.01.2018 31.12.2020
Obviously, the with
clause is not part of the solution (remove it and use your actual table and column names); I included it for testing.
回答3:
This is a type of gaps-and-islands problem.
For your sample data, you can use lag()
to see if the previous row overlaps. If not, then the row is the start of an island. A cumulative sum of the island starts defines all rows in the island -- which can be used for aggregation:
select obj_number, obj_related, min(valid_from), max(valid_to)
from (select t.*,
sum(case when prev_valid_to >= valid_from then 0 else 1 end) over (partition by obj_number, obj_related order by valid_from) as grp
from (select t.*,
lag(valid_to) over (partition by obj_number, obj_related order by valid_from) as prev_valid_to
from t
) t
) t
group by obj_number, obj_related;
来源:https://stackoverflow.com/questions/65940272/merge-data-depending-on-object-and-no-gaps-in-dates