I have a road_events
table:
create table road_events (
event_id number(4,0),
road_id number(4,0),
year number(4,0),
from_meas number
Solution:
SELECT RE.road_id, RE.event_id, RE.year, RE.from_meas, RE.to_meas, RE.road_length, RE.event_length, RE.used_length, RE.leftover_length
FROM
(
SELECT RE.C_road_id[road_id], RE.C_event_id[event_id], RE.C_year[year], RE.C_from_meas[from_meas], RE.C_to_meas[to_meas], RE.C_road_length[road_length],
RE.event_length, RE.used_length, (RE.event_length - (CASE WHEN RE.HasOverlap = 1 THEN RE.used_length ELSE 0 END))[leftover_length]
FROM
(
SELECT RE.C_road_id, RE.C_event_id, RE.C_year, RE.C_from_meas, RE.C_to_meas, RE.C_road_length,
(CASE WHEN MAX(RE.A_event_id) IS NOT NULL THEN 1 ELSE 0 END)[HasOverlap],
(RE.C_to_meas - RE.C_from_meas)[event_length],
SUM( (CASE WHEN RE.O_to_meas <= RE.C_to_meas THEN RE.O_to_meas ELSE RE.C_to_meas END)
- (CASE WHEN RE.O_from_meas >= RE.C_from_meas THEN RE.O_from_meas ELSE RE.C_from_meas END)
)[used_length]--This is the length that is already being counted towards later years.
FROM
(
SELECT RE.C_road_id, RE.C_event_id, RE.C_year, RE.C_from_meas, RE.C_to_meas, RE.C_road_length,
RE.A_event_id, MIN(RE.O_from_meas)[O_from_meas], MAX(RE.O_to_meas)[O_to_meas]
FROM
(
SELECT RE_C.road_id[C_road_id], RE_C.event_id[C_event_id], RE_C.year[C_year], RE_C.from_meas[C_from_meas], RE_C.to_meas[C_to_meas], RE_C.total_road_length[C_road_length],
RE_A.road_id[A_road_id], RE_A.event_id[A_event_id], RE_A.year[A_year], RE_A.from_meas[A_from_meas], RE_A.to_meas[A_to_meas], RE_A.total_road_length[A_road_length],
RE_O.road_id[O_road_id], RE_O.event_id[O_event_id], RE_O.year[O_year], RE_O.from_meas[O_from_meas], RE_O.to_meas[O_to_meas], RE_O.total_road_length[O_road_length],
(ROW_NUMBER() OVER (PARTITION BY RE_C.road_id, RE_C.event_id, RE_O.event_id ORDER BY RE_S.Overlap DESC, RE_A.event_id))[RowNum]--Use to Group Overlaps into Swaths.
FROM road_events as RE_C--Current.
LEFT JOIN road_events as RE_A--After. --Use a Left-Join to capture when there is only 1 Event (or it is the Last-Event in the list).
ON RE_A.road_id = RE_C.road_id
AND RE_A.event_id != RE_C.event_id--Not the same EventID.
AND RE_A.year >= RE_C.year--Occured on or After the Current Event.
AND ( (RE_A.from_meas >= RE_C.from_meas AND RE_A.from_meas <= RE_C.to_meas)--There is Overlap.
OR (RE_A.to_meas >= RE_C.from_meas AND RE_A.to_meas <= RE_C.to_meas)--There is Overlap.
OR (RE_A.to_meas = RE_C.to_meas AND RE_A.from_meas = RE_C.from_meas)--They are Equal.
)
LEFT JOIN road_events as RE_O--Overlapped/Linked.
ON RE_O.road_id = RE_C.road_id
AND RE_O.event_id != RE_C.event_id--Not the same EventID.
AND RE_O.year >= RE_C.year--Occured on or After the Current Event.
AND ( (RE_O.from_meas >= RE_A.from_meas AND RE_O.from_meas <= RE_A.to_meas)--There is Overlap.
OR (RE_O.to_meas >= RE_A.from_meas AND RE_O.to_meas <= RE_A.to_meas)--There is Overlap.
OR (RE_O.to_meas = RE_A.to_meas AND RE_O.from_meas = RE_A.from_meas)--They are Equal.
)
OUTER APPLY
(
SELECT COUNT(*)[Overlap]
FROM road_events as RE_O--Overlapped/Linked.
WHERE RE_O.road_id = RE_C.road_id
AND RE_O.event_id != RE_C.event_id--Not the same EventID.
AND RE_O.year >= RE_C.year--Occured on or After the Current Event.
AND ( (RE_O.from_meas >= RE_A.from_meas AND RE_O.from_meas <= RE_A.to_meas)--There is Overlap.
OR (RE_O.to_meas >= RE_A.from_meas AND RE_O.to_meas <= RE_A.to_meas)--There is Overlap.
OR (RE_O.to_meas = RE_A.to_meas AND RE_O.from_meas = RE_A.from_meas)--They are Equal.
)
) AS RE_S--Swath of Overlaps.
) AS RE
WHERE RE.RowNum = 1--Remove Duplicates and Select those that are in the biggest Swaths.
GROUP BY RE.C_road_id, RE.C_event_id, RE.C_year, RE.C_from_meas, RE.C_to_meas, RE.C_road_length,
RE.A_event_id
) AS RE
GROUP BY RE.C_road_id, RE.C_event_id, RE.C_year, RE.C_from_meas, RE.C_to_meas, RE.C_road_length
) AS RE
) AS RE
WHERE RE.leftover_length > 0--Filter out Events that had their entire Segments overlapped by a Later Event(s).
ORDER BY RE.road_id, RE.year DESC, RE.event_id
SQL Fiddle:
http://sqlfiddle.com/#!18/2880b/1
Added Rules/Assumptions/Clarifications:
1.) Allow for the possibility event_id
and road_id
could be Guid's or created out-of-order,
so do not script assuming higher or lower values give meaning to the relationship of records.
For Example:
An ID of 1 and and ID of 2 does not guarantee the ID of 2 is the most recent one (and vice-versa).
This is so the solution will be more general and less "hacky".
2.) Filter out Events that had their entire Segments overlapped by a Later Event(s).
For Example:
If 2008 had work on 20-50 and 2009 had work on 10-60,
then the Event for 2008 would be filtered out because its entire Segment was rehashed in 2009.
Additional Test Data:
To ensure solutions are not tailored to only the DataSet given,
I have added a road_id
of 6 to the original DataSet, in order to hit a few more fringe-cases.
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (16,6,2012,0,100,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (17,6,2013,68,69,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (18,6,2014,65,66,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (19,6,2015,62,63,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (20,6,2016,50,60,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (21,6,2017,30,40,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (22,6,2017,20,55,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (23,6,2018,0,25,100);
Results: (with the 8 Additional Records I added in Green)
Database Version:
This Solution is Oracle and SQL-Server Agnostic:
It Should Work in both SS2008+ and Oracle 12c+.
This question is tagged with Oracle 12c, but there is no online-fiddle I may use without signing up,
so I tested it in SQL-Server - but the same syntax should work in both.
I rely on Cross-Apply
and Outer-Apply
for most of my queries.
Oracle introduced these "Joins" in 12c:
https://oracle-base.com/articles/12c/lateral-inline-views-cross-apply-and-outer-apply-joins-12cr1
Simplified and Performant:
This uses:
• No Correlated Subqueries.
• No Recursion.
• No CTE's.
• No Unions.
• No User Functions.
Indexes:
I read in one of your comments you had asked about Indexes.
I would add 1-Column Indexes for each the main Fields you will be searching and grouping on:
road_id
, event_id
, and year
.
You could see if this index would help you any (I don't know how you plan to use the data):
Key Fields: road_id
, event_id
, year
Include: from_meas
, to_meas
Title:
You may want to consider Renaming the Title of this Question to something more searchable like:
"Aggregate Overlapping Segments to Measure Effective Length".
This would allow the solution to be easier to find for helping others with similar problems.
Other Thoughts:
Something like this would be useful in Tallying up the Overall-Time spent on something
with overlapping Start and Stop timestamps.