Aggregate Overlapping Segments to Measure Effective Length

前端 未结 6 792
我寻月下人不归
我寻月下人不归 2021-02-07 02:08

I have a road_events table:

create table road_events (
    event_id number(4,0),
    road_id number(4,0),
    year number(4,0),
    from_meas number         


        
6条回答
  •  囚心锁ツ
    2021-02-07 02:28

    Solution:

    SELECT RE.road_id, RE.event_id, RE.year, RE.from_meas, RE.to_meas, RE.road_length, RE.event_length, RE.used_length, RE.leftover_length
      FROM
      (
        SELECT RE.C_road_id[road_id], RE.C_event_id[event_id], RE.C_year[year], RE.C_from_meas[from_meas], RE.C_to_meas[to_meas], RE.C_road_length[road_length],
               RE.event_length, RE.used_length, (RE.event_length - (CASE WHEN RE.HasOverlap = 1 THEN RE.used_length ELSE 0 END))[leftover_length]
          FROM
          (
            SELECT RE.C_road_id, RE.C_event_id, RE.C_year, RE.C_from_meas, RE.C_to_meas, RE.C_road_length,
                   (CASE WHEN MAX(RE.A_event_id) IS NOT NULL THEN 1 ELSE 0 END)[HasOverlap],
                   (RE.C_to_meas - RE.C_from_meas)[event_length],
                   SUM(   (CASE WHEN RE.O_to_meas <= RE.C_to_meas THEN RE.O_to_meas ELSE RE.C_to_meas END)
                        - (CASE WHEN RE.O_from_meas >= RE.C_from_meas THEN RE.O_from_meas ELSE RE.C_from_meas END)
                      )[used_length]--This is the length that is already being counted towards later years.
              FROM
              (
                SELECT RE.C_road_id, RE.C_event_id, RE.C_year, RE.C_from_meas, RE.C_to_meas, RE.C_road_length,
                       RE.A_event_id, MIN(RE.O_from_meas)[O_from_meas], MAX(RE.O_to_meas)[O_to_meas]
                  FROM
                  (
                    SELECT RE_C.road_id[C_road_id], RE_C.event_id[C_event_id], RE_C.year[C_year], RE_C.from_meas[C_from_meas], RE_C.to_meas[C_to_meas], RE_C.total_road_length[C_road_length],
                           RE_A.road_id[A_road_id], RE_A.event_id[A_event_id], RE_A.year[A_year], RE_A.from_meas[A_from_meas], RE_A.to_meas[A_to_meas], RE_A.total_road_length[A_road_length],
                           RE_O.road_id[O_road_id], RE_O.event_id[O_event_id], RE_O.year[O_year], RE_O.from_meas[O_from_meas], RE_O.to_meas[O_to_meas], RE_O.total_road_length[O_road_length],
                           (ROW_NUMBER() OVER (PARTITION BY RE_C.road_id, RE_C.event_id, RE_O.event_id ORDER BY RE_S.Overlap DESC, RE_A.event_id))[RowNum]--Use to Group Overlaps into Swaths.
                      FROM road_events as RE_C--Current.
                      LEFT JOIN road_events as RE_A--After.  --Use a Left-Join to capture when there is only 1 Event (or it is the Last-Event in the list).
                        ON RE_A.road_id   = RE_C.road_id
                       AND RE_A.event_id != RE_C.event_id--Not the same EventID.
                       AND RE_A.year     >= RE_C.year--Occured on or After the Current Event.
                       AND (    (RE_A.from_meas >= RE_C.from_meas AND RE_A.from_meas <= RE_C.to_meas)--There is Overlap.
                             OR (RE_A.to_meas   >= RE_C.from_meas AND RE_A.to_meas   <= RE_C.to_meas)--There is Overlap.
                             OR (RE_A.to_meas    = RE_C.to_meas   AND RE_A.from_meas  = RE_C.from_meas)--They are Equal.
                           )
                      LEFT JOIN road_events as RE_O--Overlapped/Linked.
                        ON RE_O.road_id   = RE_C.road_id
                       AND RE_O.event_id != RE_C.event_id--Not the same EventID.
                       AND RE_O.year     >= RE_C.year--Occured on or After the Current Event.
                       AND (    (RE_O.from_meas >= RE_A.from_meas AND RE_O.from_meas <= RE_A.to_meas)--There is Overlap.
                             OR (RE_O.to_meas   >= RE_A.from_meas AND RE_O.to_meas   <= RE_A.to_meas)--There is Overlap.
                             OR (RE_O.to_meas    = RE_A.to_meas   AND RE_O.from_meas  = RE_A.from_meas)--They are Equal.
                           )
                      OUTER APPLY
                      (
                        SELECT COUNT(*)[Overlap]
                          FROM road_events as RE_O--Overlapped/Linked.
                         WHERE RE_O.road_id   = RE_C.road_id
                           AND RE_O.event_id != RE_C.event_id--Not the same EventID.
                           AND RE_O.year     >= RE_C.year--Occured on or After the Current Event.
                           AND (    (RE_O.from_meas >= RE_A.from_meas AND RE_O.from_meas <= RE_A.to_meas)--There is Overlap.
                                 OR (RE_O.to_meas   >= RE_A.from_meas AND RE_O.to_meas   <= RE_A.to_meas)--There is Overlap.
                                 OR (RE_O.to_meas    = RE_A.to_meas   AND RE_O.from_meas  = RE_A.from_meas)--They are Equal.
                               )
                      ) AS RE_S--Swath of Overlaps.
                  ) AS RE
                 WHERE RE.RowNum = 1--Remove Duplicates and Select those that are in the biggest Swaths.
                 GROUP BY RE.C_road_id, RE.C_event_id, RE.C_year, RE.C_from_meas, RE.C_to_meas, RE.C_road_length,
                          RE.A_event_id
              ) AS RE
             GROUP BY RE.C_road_id, RE.C_event_id, RE.C_year, RE.C_from_meas, RE.C_to_meas, RE.C_road_length
          ) AS RE
      ) AS RE
     WHERE RE.leftover_length > 0--Filter out Events that had their entire Segments overlapped by a Later Event(s).
     ORDER BY RE.road_id, RE.year DESC, RE.event_id
    

    SQL Fiddle:
        http://sqlfiddle.com/#!18/2880b/1

    Added Rules/Assumptions/Clarifications:
    1.) Allow for the possibility event_id and road_id could be Guid's or created out-of-order,
        so do not script assuming higher or lower values give meaning to the relationship of records.
        For Example:
          An ID of 1 and and ID of 2 does not guarantee the ID of 2 is the most recent one (and vice-versa).
          This is so the solution will be more general and less "hacky".
    2.) Filter out Events that had their entire Segments overlapped by a Later Event(s).
        For Example:
          If 2008 had work on 20-50 and 2009 had work on 10-60,
          then the Event for 2008 would be filtered out because its entire Segment was rehashed in 2009.

    Additional Test Data:
    To ensure solutions are not tailored to only the DataSet given,
        I have added a road_id of 6 to the original DataSet, in order to hit a few more fringe-cases.

    INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (16,6,2012,0,100,100);
    INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (17,6,2013,68,69,100);
    INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (18,6,2014,65,66,100);
    INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (19,6,2015,62,63,100);
    INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (20,6,2016,50,60,100);
    INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (21,6,2017,30,40,100);
    INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (22,6,2017,20,55,100);
    INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (23,6,2018,0,25,100);
    

    Results: (with the 8 Additional Records I added in Green)

    Database Version:
    This Solution is Oracle and SQL-Server Agnostic:
        It Should Work in both SS2008+ and Oracle 12c+.

    This question is tagged with Oracle 12c, but there is no online-fiddle I may use without signing up,
        so I tested it in SQL-Server - but the same syntax should work in both.
    I rely on Cross-Apply and Outer-Apply for most of my queries.
    Oracle introduced these "Joins" in 12c:
        https://oracle-base.com/articles/12c/lateral-inline-views-cross-apply-and-outer-apply-joins-12cr1

    Simplified and Performant:
    This uses:
        • No Correlated Subqueries.
        • No Recursion.
        • No CTE's.
        • No Unions.
        • No User Functions.

    Indexes:
    I read in one of your comments you had asked about Indexes.
    I would add 1-Column Indexes for each the main Fields you will be searching and grouping on:
        road_id, event_id, and year.
    You could see if this index would help you any (I don't know how you plan to use the data):
        Key Fields: road_id, event_id, year
        Include: from_meas, to_meas

    Title:
    You may want to consider Renaming the Title of this Question to something more searchable like:
        "Aggregate Overlapping Segments to Measure Effective Length".
    This would allow the solution to be easier to find for helping others with similar problems.

    Other Thoughts:
    Something like this would be useful in Tallying up the Overall-Time spent on something
        with overlapping Start and Stop timestamps.

提交回复
热议问题