SQL Challenge/Puzzle: How to merge nested ranges?

后端 未结 3 1745
醉酒成梦
醉酒成梦 2021-01-06 03:40
  • This challenge is based on a real life use-case involving IP ranges.
  • The solution I came with is based on the stack trace challenge I\'ve presented previously
相关标签:
3条回答
  • 2021-01-06 04:25

    Oracle solution:

    with l as ( select level lvl from dual connect by level < 66 ),
         r as ( select range_start r1, range_end r2, range_val v, 
                        range_end - range_start + 1 cnt 
                  from ranges ),
         t1 as (select distinct lvl, 
                       nvl(max(v) keep (dense_rank first order by cnt) 
                                  over (partition by lvl), '*' ) m
                  from l left join r on lvl between r1 and r2 ),
         t2 as (select lvl, m, case when lag(m) over (order by lvl) <> m then 0 else 1 end mrk 
                  from t1),
         t3 as (select lvl, m, lvl - sum(mrk) over (order by lvl) grp from t2)
    select min(lvl) r1, max(lvl) r2, nullif(min(m), '*') val
      from t3 group by grp order by r1
    

    Output is as requested. My English is far from good, so it's hard to explain, but let's try:

    • l - number generator,
    • r - data from ranges with counted distance,
    • t1 - finds value with minimal distance for each lvl,
    • t2 - adds markers telling if range starts,
    • t3 - adds column which we will next use for grouping data.
    0 讨论(0)
  • 2021-01-06 04:41

    Oracle solution 2

     WITH borders AS /*get all borders of interval*/ 
      (SELECT DISTINCT DECODE(is_end, 0, range_start, range_end) AS border 
                      ,is_end 
       FROM   ranges r, 
              (SELECT 0 AS is_end FROM dual UNION ALL 
               SELECT 1 AS is_end FROM dual)), 
     interv AS  /*get all intervals*/ 
      (SELECT border + is_end AS beg_int 
             ,lead(border) over(ORDER BY border, is_end ) 
               - lead(DECODE(is_end, 0, 1, 0)) over(ORDER BY border, is_end) AS end_int 
       FROM   borders 
       ORDER  BY 1) 
     SELECT i.beg_int 
           ,i.end_int 
           ,(SELECT MAX(r.range_val) keep (dense_rank FIRST ORDER BY r.range_end - r.range_start) 
           FROM ranges r 
           WHERE i.beg_int >= r.range_start AND i.end_int <= r.range_end) AS range_val   
     FROM   interv i 
     WHERE  beg_int <= end_int OR end_int IS NULL 
     ORDER  BY i.beg_int; 
    

    Add solution without self join : EDIT: fixed defect.

     WITH intervals AS 
      (SELECT DECODE(is_end, -1, range_val, NULL) AS range_val 
             ,DECODE(is_end, -1, range_start, range_end) AS border 
             ,is_end 
             ,- (SUM(is_end) over(ORDER BY DECODE(is_end, -1, range_start, range_end), is_end, (range_end - range_start) * is_end)) AS poss 
             ,(range_end - range_start) * is_end AS ord2 
       FROM   ranges r 
             ,(SELECT -1 AS is_end FROM   dual UNION ALL 
               SELECT 1  AS is_end FROM   dual)), 
     range_stack AS 
      (SELECT border + DECODE(is_end, 1, 1, 0) AS begin_int 
             ,lead(border) over(ORDER BY border, is_end, ord2) 
               + DECODE(lead(is_end) over(ORDER BY border, is_end, ord2), 1, 0, -1) AS end_int 
             ,last_value(range_val ignore NULLS) over(PARTITION BY poss ORDER BY border, is_end, ord2) AS range_val 
       FROM   intervals) 
     SELECT begin_int 
           ,end_int 
           ,range_val 
     FROM   range_stack 
     WHERE  end_int >= begin_int 
            OR end_int IS NULL;
    
    0 讨论(0)
  • 2021-01-06 04:44
    • The solution is based on the stack trace challenge I've presented previously. Each range start is treated as a PUSH operation and each range end + 1 is treated as a POP operation.
    • Performence wise, you may notice how the 2 internal analytic functions use the same windowing, therefore being executed in a single step.

    Teradata

    select      new_range_start
               ,new_range_end
    
               ,last_value (range_val ignore nulls) over 
                (
                    partition by    stack_depth
                    order by        new_range_start ,range_start ,range_end desc 
                    rows            unbounded preceding
                )                                                                   as new_range_val
    
    from       (select      new_range_start
                           ,range_val
                           ,range_start
                           ,range_end
    
                           ,sum (case when range_val is null then -1 else 1 end) over 
                            (
                                order by    new_range_start, range_start ,range_end desc  
                                rows        unbounded preceding
                            )                                                                   as stack_depth
    
                           ,min (new_range_start) over
                            (
                                order by    new_range_start ,range_start ,range_end desc
                                rows        between 1 following and 1 following
    
                            ) - 1                                                               as new_range_end
    
                from        (           select range_start     ,range_start ,range_end ,range_val              from ranges
                            union all   select range_end   + 1 ,range_start ,range_end ,cast (null as char(1)) from ranges
                            )
                            r (new_range_start,range_start,range_end,range_val)
                )
                r
    
    qualify     new_range_end >= new_range_start
    
    order by    new_range_start
    ;
    

    Oracle

    select      new_range_start
               ,new_range_end
               ,new_range_val                       
    
    from       (select      new_range_start
                           ,new_range_end
    
                           ,last_value (range_val ignore nulls) over 
                            (
                                partition by    stack_depth
                                order by        new_range_start ,range_start ,range_end desc 
                                rows            unbounded preceding
                            )                                                                   as new_range_val
    
    
                from       (select      new_range_start
                                       ,range_start
                                       ,range_end
                                       ,range_val
    
                                       ,sum (case when range_val is null then -1 else 1 end) over 
                                        (
                                            order by    new_range_start, range_start ,range_end desc  
                                            rows        unbounded preceding
                                        )                                                                as stack_depth
    
                                       ,lead (new_range_start) over
                                        (
                                            order by    new_range_start, range_start ,range_end desc 
                                        ) - 1                                                            as new_range_end
    
                            from        (           select range_start     as new_range_start ,range_start ,range_end ,range_val              from ranges
                                        union all   select range_end   + 1                    ,range_start ,range_end ,cast (null as char(1)) from ranges
                                        )
                                        r 
                            )
                            r
                )
                r
    
    where       new_range_end >= new_range_start
    
    order by    new_range_start
    ;
    

    SQL Server / PostgreSQL / Hive

    select      *
    
    from       (select      new_range_start
                           ,new_range_end
                           ,min (range_val) over
                            (
                                partition by    stack_depth,new_range_val_group_id
                            )                                                       as new_range_val                       
    
                from       (select      new_range_start
                                       ,new_range_end
                                       ,range_val
                                       ,stack_depth
    
                                       ,count (range_val) over 
                                        (
                                            partition by    stack_depth
                                            order by        new_range_start ,range_start ,range_end desc 
                                            rows            unbounded preceding
                                        )                                                                   as new_range_val_group_id
    
    
                            from       (select      new_range_start
                                                   ,range_start
                                                   ,range_end
                                                   ,range_val
    
                                                   ,sum (case when range_val is null then -1 else 1 end) over 
                                                    (
                                                        order by    new_range_start, range_start ,range_end desc  
                                                        rows        unbounded preceding
                                                    )                                                                as stack_depth
    
                                                   ,lead (new_range_start) over
                                                    (
                                                        order by    new_range_start, range_start ,range_end desc 
                                                    ) - 1                                                            as new_range_end
    
                                        from        (           select range_start     as new_range_start ,range_start ,range_end ,range_val                           from ranges
                                                    union all   select range_end   + 1 as new_range_start ,range_start ,range_end ,cast (null as char(1)) as range_val from ranges
                                                    )
                                                    r 
                                        )
                                        r
                            )
                            r
                )
                r
    
    where       new_range_end >= new_range_start
    
    order by    new_range_start
    ;
    
    0 讨论(0)
提交回复
热议问题