Merge data depending on object and no gaps in dates

问题

I have got in the DB data like you can see below (additional info about dates: date in valid_from is included, date in valid_to is excluded)

obj_number	obj_related	valid_from	valid_to
AA	BB	01.01.2018	01.01.2019
AA	BB	01.01.2019	31.03.2019
AA	BB	31.03.2019
AA	CC	01.01.2020	30.06.2020
AA	CC	02.07.2020	31.10.2020
AA	CC	31.10.2020	31.12.2020
AA	DD	01.01.2018	30.11.2020
AA	DD	30.11.2020	31.12.2020

I have to merge the data, but in a special way. It should be merged around obj_related to show the minimum valid_from and maximum valid_from/null. But if there is a GAP in dates like you can see for CC (row 4 and 5) then both records should be in the result. The best way to understand it when I show you the correct result:

obj_number	obj_related	valid_from	valid_to
AA	BB	01.01.2018
AA	CC	01.01.2020	30.06.2020
AA	CC	02.07.2020	31.12.2020
AA	DD	01.01.2018	31.12.2020

Oracle version: 12.1.0.2

Could you help me to prepare an SQL query

回答1:

Here is a different solution, which should work in Oracle 10.1 and higher - using the Tabibitosan method. The problem is slightly complicated by the use of NULL to mark an indefinite valid_to date; in particular, the definition of valid_to in the outer query can't simply be max(valid_to) within each group, since that would produce the wrong answer when valid_to may be null.

Other than that, the computation that produces the grp column in the subquery is the main idea: it produces a different date for each "island" in the "gaps and islands" structure of the input data. This is a less known use of the Tabibitosan method; it makes this kind of query as efficient as possible since it requires only one level of analytic functions.

/*
with
  sample_data (...) as (...)
*/
select obj_number, obj_related, min(valid_from) as valid_from,
       max(valid_to) keep (dense_rank last order by valid_from) as valid_to
from   (
         select sd.*,
                nvl(valid_to, date '9999-12-31') - 
                  sum(nvl(valid_to, date '9999-12-31') - valid_from)
                      over (partition by obj_number, obj_related 
                            order     by valid_from) as grp
         from   sample_data sd
       )
group  by obj_number, obj_related, grp
order  by obj_number, obj_related, valid_from
;

The best way to try to understand how the Tabibitosan method works (in this case) is to run the subquery separately and to see what it produces.

回答2:

In Oracle 12.1 or higher, you can solve this easily with match_recognize:

alter session set nls_date_format='dd.mm.yyyy';

with
  sample_data (obj_number, obj_related, valid_from, valid_to) as (
    select 'AA', 'BB', to_date('01.01.2018'), to_date('01.01.2019') from dual union all
    select 'AA', 'BB', to_date('01.01.2019'), to_date('31.03.2019') from dual union all
    select 'AA', 'BB', to_date('31.03.2019'), null                  from dual union all
    select 'AA', 'CC', to_date('01.01.2020'), to_date('30.06.2020') from dual union all
    select 'AA', 'CC', to_date('02.07.2020'), to_date('31.10.2020') from dual union all
    select 'AA', 'CC', to_date('31.10.2020'), to_date('31.12.2020') from dual union all
    select 'AA', 'DD', to_date('01.01.2018'), to_date('30.11.2020') from dual union all
    select 'AA', 'DD', to_date('30.11.2020'), to_date('31.12.2020') from dual
  )
select *
from   sample_data
match_recognize(
  partition by obj_number, obj_related
  order     by valid_from
  measures  first(valid_from) as valid_from, last(valid_to) as valid_to
  pattern   ( a b* )
  define    b as valid_from = prev (valid_to)
);

OBJ_NUMBER OBJ_RELATED  VALID_FROM VALID_TO  
---------- ------------ ---------- ----------
AA         BB           01.01.2018           
AA         CC           01.01.2020 30.06.2020
AA         CC           02.07.2020 31.12.2020
AA         DD           01.01.2018 31.12.2020

Obviously, the with clause is not part of the solution (remove it and use your actual table and column names); I included it for testing.

回答3:

This is a type of gaps-and-islands problem.

For your sample data, you can use lag() to see if the previous row overlaps. If not, then the row is the start of an island. A cumulative sum of the island starts defines all rows in the island -- which can be used for aggregation:

select obj_number, obj_related, min(valid_from), max(valid_to)
from (select t.*,
             sum(case when prev_valid_to >= valid_from then 0 else 1 end) over (partition by obj_number, obj_related order by valid_from) as grp
      from (select t.*,
                   lag(valid_to) over (partition by obj_number, obj_related order by valid_from) as prev_valid_to
            from t
           ) t
     ) t
group by obj_number, obj_related;

来源：https://stackoverflow.com/questions/65940272/merge-data-depending-on-object-and-no-gaps-in-dates

标签

sql

Oracle