问题
I have a road_events
table:
create table road_events (
event_id number(4,0),
road_id number(4,0),
year number(4,0),
from_meas number(10,2),
to_meas number(10,2),
total_road_length number(10,2)
);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (1,1,2020,25,50,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (2,1,2000,25,50,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (3,1,1980,0,25,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (4,1,1960,75,100,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (5,1,1940,1,100,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (6,2,2000,10,30,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (7,2,1975,30,60,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (8,2,1950,50,90,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (9,3,2050,40,90,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (10,4,2040,0,200,200);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (11,4,2013,0,199,200);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (12,4,2001,0,200,200);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (13,5,1985,50,70,300);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (14,5,1985,10,50,300);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (15,5,1965,1,301,300);
commit;
select * from road_events;
EVENT_ID ROAD_ID YEAR FROM_MEAS TO_MEAS TOTAL_ROAD_LENGTH
---------- ---------- ---------- ---------- ---------- -----------------
1 1 2020 25 50 100
2 1 2000 25 50 100
3 1 1980 0 25 100
4 1 1960 75 100 100
5 1 1940 1 100 100
6 2 2000 10 30 100
7 2 1975 30 60 100
8 2 1950 50 90 100
9 3 2050 40 90 100
10 4 2040 0 200 200
11 4 2013 0 199 200
12 4 2001 0 200 200
13 5 1985 50 70 300
14 5 1985 10 50 300
15 5 1965 1 301 300
I want to select the events that represent the most recent work on each road.
This is a tricky operation, because the events often pertain to only a portion of the road. This means that I can't simply select the most recent event per road; I need to only select the most recent event mileage that doesn't overlap.
Possible logic (in order):
I'm reluctant to guess at how this problem could be solved, because it could end up hurting more than it helps (kind of like the XY Problem). On the other hand, it might provide insight into the nature of the problem, so here it goes:
- Select the most recent event for each road. We'll call the most recent event:
event A
. - If
event A
is>= total_road_length
, then that's all I need. The algorithm ends here. - Else, get the next chronological event (
event B
) that does not have the same extents asevent A
. - If the extents of
event B
overlap the extents ofevent A
, then only get the portion(s) ofevent B
that do not overlap. - Repeat steps 3 and 4 until the total event length is
= total_road_length
. Or stop when there are no more events for that road.
Question:
I know it's a tall order, but what would it take to do this?
This is a classic linear referencing problem. It would be extremely helpful if I could do linear referencing operations as part of queries.
The result would be:
EVENT_ID ROAD_ID YEAR TOTAL_ROAD_LENGTH EVENT_LENGTH
---------- ---------- ---------- ----------------- ------------
1 1 2020 100 25
3 1 1980 100 25
4 1 1960 100 25
5 1 1940 100 25
6 2 2000 100 20
7 2 1975 100 30
8 2 1950 100 30
9 3 2050 100 50
10 4 2040 200 200
13 5 1985 300 20
14 5 1985 300 40
15 5 1965 300 240
Related question: Select where number range does not overlap
回答1:
My main DBMS is Teradata, but this will work as-is in Oracle, too.
WITH all_meas AS
( -- get a distinct list of all from/to points
SELECT road_id, from_meas AS meas
FROM road_events
UNION
SELECT road_id, to_meas
FROM road_events
)
-- select * from all_meas order by 1,2
, all_ranges AS
( -- create from/to ranges
SELECT road_id, meas AS from_meas
,Lead(meas)
Over (PARTITION BY road_id
ORDER BY meas) AS to_meas
FROM all_meas
)
-- SELECT * from all_ranges order by 1,2
, all_event_ranges AS
( -- now match the ranges to the event ranges
SELECT
ar.*
,re.event_id
,re.year
,re.total_road_length
,ar.to_meas - ar.from_meas AS event_length
-- used to filter the latest event as multiple events might cover the same range
,Row_Number()
Over (PARTITION BY ar.road_id, ar.from_meas
ORDER BY year DESC) AS rn
FROM all_ranges ar
JOIN road_events re
ON ar.road_id = re.road_id
AND ar.from_meas < re.to_meas
AND ar.to_meas > re.from_meas
WHERE ar.to_meas IS NOT NULL
)
SELECT event_id, road_id, year, total_road_length, Sum(event_length)
FROM all_event_ranges
WHERE rn = 1 -- latest year only
GROUP BY event_id, road_id, year, total_road_length
ORDER BY road_id, year DESC;
If you need to return the actual covered from/to_meas
(as in your question before edit), it might be more complicated. The first part is the same, but without aggregation the query can return adjacent rows with the same event_id (e.g. for event 3: 0-1 & 1-25):
SELECT * FROM all_event_ranges
WHERE rn = 1
ORDER BY road_id, from_meas;
If you want to merge adjacent rows you need two more steps (using a standard approach, flag the 1st row of a group and calculate a group number):
WITH all_meas AS
(
SELECT road_id, from_meas AS meas
FROM road_events
UNION
SELECT road_id, to_meas
FROM road_events
)
-- select * from all_meas order by 1,2
, all_ranges AS
(
SELECT road_id, meas AS from_meas
,Lead(meas)
Over (PARTITION BY road_id
ORDER BY meas) AS to_meas
FROM all_meas
)
-- SELECT * from all_ranges order by 1,2
, all_event_ranges AS
(
SELECT
ar.*
,re.event_id
,re.year
,re.total_road_length
,ar.to_meas - ar.from_meas AS event_length
,Row_Number()
Over (PARTITION BY ar.road_id, ar.from_meas
ORDER BY year DESC) AS rn
FROM all_ranges ar
JOIN road_events re
ON ar.road_id = re.road_id
AND ar.from_meas < re.to_meas
AND ar.to_meas > re.from_meas
WHERE ar.to_meas IS NOT NULL
)
-- SELECT * FROM all_event_ranges WHERE rn = 1 ORDER BY road_id, from_meas
, adjacent_events AS
( -- assign 1 to the 1st row of an event
SELECT t.*
,CASE WHEN Lag(event_id)
Over(PARTITION BY road_id
ORDER BY from_meas) = event_id
THEN 0
ELSE 1
END AS flag
FROM all_event_ranges t
WHERE rn = 1
)
-- SELECT * FROM adjacent_events ORDER BY road_id, from_meas
, grouped_events AS
( -- assign a groupnumber to adjacent rows using a Cumulative Sum over 0/1
SELECT t.*
,Sum(flag)
Over (PARTITION BY road_id
ORDER BY from_meas
ROWS Unbounded Preceding) AS grp
FROM adjacent_events t
)
-- SELECT * FROM grouped_events ORDER BY road_id, from_meas
SELECT event_id, road_id, year, Min(from_meas), Max(to_meas), total_road_length, Sum(event_length)
FROM grouped_events
GROUP BY event_id, road_id, grp, year, total_road_length
ORDER BY 2, Min(from_meas);
Edit:
Ups, I just found a blog Overlapping ranges with priority doing exactly the same with some simplified Oracle syntax. In fact I translated my query from a some other simplified syntax in Teradata to Standard/Oracle SQL :-)
回答2:
There is another way to calculate this, with from and to values:
with
part_begin_point as (
Select distinct road_id, from_meas point
from road_events be
union
Select distinct road_id, to_meas point
from road_events ee
)
, newest_part as (
select e.event_id
, e.road_id
, e.year
, e.total_road_length
, p.point
, LAG(e.event_id) over (partition by p.road_id order by p.point) prev_event
, e.to_meas event_to_meas
from part_begin_point p
join road_events e
on p.road_id = e.road_id
and p.point >= e.from_meas and p.point < e.to_meas
and not exists(
select 1 from road_events ne
where e.road_id = ne.road_id
and p.point >= ne.from_meas and p.point < ne.to_meas
and (e.year < ne.year or e.year = ne.year and e.event_id < ne.event_id))
)
select event_id, road_id, year
, point from_meas
, LEAD(point, 1, event_to_meas) over (partition by road_id order by point) to_meas
, total_road_length
, LEAD(point, 1, event_to_meas) over (partition by road_id order by point) - point EVENT_LENGTH
from newest_part
where 1=1
and event_id <> prev_event or prev_event is null
order by event_id, point
SQL Fiddle
回答3:
Thought about this too much today, but I have something that ignores the +/- 10 meters now.
First made a function that takes in to / from pairs as a string and returns the distance covered by the pairs in the string. For example '10:20;35:45' returns 20.
CREATE
OR replace FUNCTION get_distance_range_str (strRangeStr VARCHAR2)
RETURN NUMBER IS intRetNum NUMBER;
BEGIN
--split input string
WITH cte_1
AS (
SELECT regexp_substr(strRangeStr, '[^;]+', 1, LEVEL) AS TO_FROM_STRING
FROM dual connect BY regexp_substr(strRangeStr, '[^;]+', 1, LEVEL) IS NOT NULL
)
--split From/To pairs
,cte_2
AS (
SELECT cte_1.TO_FROM_STRING
,to_number(substr(cte_1.TO_FROM_STRING, 1, instr(cte_1.TO_FROM_STRING, ':') - 1)) AS FROM_MEAS
,to_number(substr(cte_1.TO_FROM_STRING, instr(cte_1.TO_FROM_STRING, ':') + 1, length(cte_1.TO_FROM_STRING) - instr(cte_1.TO_FROM_STRING, ':'))) AS TO_MEAS
FROM cte_1
)
--merge ranges
,cte_merge_ranges
AS (
SELECT s1.FROM_MEAS
,
--t1.TO_MEAS
MIN(t1.TO_MEAS) AS TO_MEAS
FROM cte_2 s1
INNER JOIN cte_2 t1 ON s1.FROM_MEAS <= t1.TO_MEAS
AND NOT EXISTS (
SELECT *
FROM cte_2 t2
WHERE t1.TO_MEAS >= t2.FROM_MEAS
AND t1.TO_MEAS < t2.TO_MEAS
)
WHERE NOT EXISTS (
SELECT *
FROM cte_2 s2
WHERE s1.FROM_MEAS > s2.FROM_MEAS
AND s1.FROM_MEAS <= s2.TO_MEAS
)
GROUP BY s1.FROM_MEAS
)
SELECT sum(TO_MEAS - FROM_MEAS) AS DISTANCE_COVERED
INTO intRetNum
FROM cte_merge_ranges;
RETURN intRetNum;
END;
Then wrote this query that builds a string for that function for the appropriate prior range. Couldn't use windowing with list_agg, but was able to achieve same with a correlated subquery.
--use list agg to create list of to/from pairs for rows before current row in the ordering
WITH cte_2
AS (
SELECT T1.*
,(
SELECT LISTAGG(FROM_MEAS || ':' || TO_MEAS || ';') WITHIN
GROUP (
ORDER BY ORDER BY YEAR DESC, EVENT_ID DESC
)
FROM road_events T2
WHERE T1.YEAR || lpad(T1.EVENT_ID, 10,'0') <
T2.YEAR || lpad(T2.EVENT_ID, 10,'0')
AND T1.ROAD_ID = T2.ROAD_ID
GROUP BY road_id
) AS PRIOR_RANGES_STR
FROM road_events T1
)
--get distance for prior range string - distance ignoring current row
--get distance including current row
,cte_3
AS (
SELECT cte_2.*
,coalesce(get_distance_range_str(PRIOR_RANGES_STR), 0) AS DIST_PRIOR
,get_distance_range_str(PRIOR_RANGES_STR || FROM_MEAS || ':' || TO_MEAS || ';') AS DIST_NOW
FROM cte_2 cte_2
)
--distance including current row less distance ignoring current row is distance added to the range this row
,cte_4
AS (
SELECT cte_3.*
,DIST_NOW - DIST_PRIOR AS DIST_ADDED_THIS_ROW
FROM cte_3
)
SELECT *
FROM cte_4
--filter out any rows with distance added as 0
WHERE DIST_ADDED_THIS_ROW > 0
ORDER BY ROAD_ID, YEAR DESC, EVENT_ID DESC
sqlfiddle here: http://sqlfiddle.com/#!4/81331/36
Looks to me like the results match yours. I left the additional columns in the final query to try to illustrate each step.
Works on the test case - might need some work to handle all possibilities in a larger data set, but I think this would be a good place to start and refine.
Credit for Overlapping range merge is first answer here: Merge overlapping date intervals
Credit for list_agg with windowing is first answer here: LISTAGG equivalent with windowing clause
回答4:
I had a problem with your "road events", because of you don't describe what is 1st meas
, I posit it is period between 0 and 1 without 1.
so, you can count this with one query:
with newest_MEAS as (
select ROAD_ID, MEAS.m, max(year) y
from road_events
join (select rownum -1 m
from dual
connect by rownum -1 <= (select max(TOTAL_ROAD_LENGTH) from road_events) ) MEAS
on MEAS.m between FROM_MEAS and TO_MEAS
group by ROAD_ID, MEAS.m )
select re.event_id, nm.ROAD_ID, re.total_road_length, nm.y, count(nm.m) EVENT_LENGTH
from newest_MEAS nm
join road_events re
on nm.ROAD_ID = re.ROAD_ID
and nm.m between re.from_meas and re.to_meas -1
and nm.y = re.year
group by re.event_id, nm.ROAD_ID, re.total_road_length, nm.y
order by event_id
SQL Fiddle
回答5:
Solution:
SELECT RE.road_id, RE.event_id, RE.year, RE.from_meas, RE.to_meas, RE.road_length, RE.event_length, RE.used_length, RE.leftover_length
FROM
(
SELECT RE.C_road_id[road_id], RE.C_event_id[event_id], RE.C_year[year], RE.C_from_meas[from_meas], RE.C_to_meas[to_meas], RE.C_road_length[road_length],
RE.event_length, RE.used_length, (RE.event_length - (CASE WHEN RE.HasOverlap = 1 THEN RE.used_length ELSE 0 END))[leftover_length]
FROM
(
SELECT RE.C_road_id, RE.C_event_id, RE.C_year, RE.C_from_meas, RE.C_to_meas, RE.C_road_length,
(CASE WHEN MAX(RE.A_event_id) IS NOT NULL THEN 1 ELSE 0 END)[HasOverlap],
(RE.C_to_meas - RE.C_from_meas)[event_length],
SUM( (CASE WHEN RE.O_to_meas <= RE.C_to_meas THEN RE.O_to_meas ELSE RE.C_to_meas END)
- (CASE WHEN RE.O_from_meas >= RE.C_from_meas THEN RE.O_from_meas ELSE RE.C_from_meas END)
)[used_length]--This is the length that is already being counted towards later years.
FROM
(
SELECT RE.C_road_id, RE.C_event_id, RE.C_year, RE.C_from_meas, RE.C_to_meas, RE.C_road_length,
RE.A_event_id, MIN(RE.O_from_meas)[O_from_meas], MAX(RE.O_to_meas)[O_to_meas]
FROM
(
SELECT RE_C.road_id[C_road_id], RE_C.event_id[C_event_id], RE_C.year[C_year], RE_C.from_meas[C_from_meas], RE_C.to_meas[C_to_meas], RE_C.total_road_length[C_road_length],
RE_A.road_id[A_road_id], RE_A.event_id[A_event_id], RE_A.year[A_year], RE_A.from_meas[A_from_meas], RE_A.to_meas[A_to_meas], RE_A.total_road_length[A_road_length],
RE_O.road_id[O_road_id], RE_O.event_id[O_event_id], RE_O.year[O_year], RE_O.from_meas[O_from_meas], RE_O.to_meas[O_to_meas], RE_O.total_road_length[O_road_length],
(ROW_NUMBER() OVER (PARTITION BY RE_C.road_id, RE_C.event_id, RE_O.event_id ORDER BY RE_S.Overlap DESC, RE_A.event_id))[RowNum]--Use to Group Overlaps into Swaths.
FROM road_events as RE_C--Current.
LEFT JOIN road_events as RE_A--After. --Use a Left-Join to capture when there is only 1 Event (or it is the Last-Event in the list).
ON RE_A.road_id = RE_C.road_id
AND RE_A.event_id != RE_C.event_id--Not the same EventID.
AND RE_A.year >= RE_C.year--Occured on or After the Current Event.
AND ( (RE_A.from_meas >= RE_C.from_meas AND RE_A.from_meas <= RE_C.to_meas)--There is Overlap.
OR (RE_A.to_meas >= RE_C.from_meas AND RE_A.to_meas <= RE_C.to_meas)--There is Overlap.
OR (RE_A.to_meas = RE_C.to_meas AND RE_A.from_meas = RE_C.from_meas)--They are Equal.
)
LEFT JOIN road_events as RE_O--Overlapped/Linked.
ON RE_O.road_id = RE_C.road_id
AND RE_O.event_id != RE_C.event_id--Not the same EventID.
AND RE_O.year >= RE_C.year--Occured on or After the Current Event.
AND ( (RE_O.from_meas >= RE_A.from_meas AND RE_O.from_meas <= RE_A.to_meas)--There is Overlap.
OR (RE_O.to_meas >= RE_A.from_meas AND RE_O.to_meas <= RE_A.to_meas)--There is Overlap.
OR (RE_O.to_meas = RE_A.to_meas AND RE_O.from_meas = RE_A.from_meas)--They are Equal.
)
OUTER APPLY
(
SELECT COUNT(*)[Overlap]
FROM road_events as RE_O--Overlapped/Linked.
WHERE RE_O.road_id = RE_C.road_id
AND RE_O.event_id != RE_C.event_id--Not the same EventID.
AND RE_O.year >= RE_C.year--Occured on or After the Current Event.
AND ( (RE_O.from_meas >= RE_A.from_meas AND RE_O.from_meas <= RE_A.to_meas)--There is Overlap.
OR (RE_O.to_meas >= RE_A.from_meas AND RE_O.to_meas <= RE_A.to_meas)--There is Overlap.
OR (RE_O.to_meas = RE_A.to_meas AND RE_O.from_meas = RE_A.from_meas)--They are Equal.
)
) AS RE_S--Swath of Overlaps.
) AS RE
WHERE RE.RowNum = 1--Remove Duplicates and Select those that are in the biggest Swaths.
GROUP BY RE.C_road_id, RE.C_event_id, RE.C_year, RE.C_from_meas, RE.C_to_meas, RE.C_road_length,
RE.A_event_id
) AS RE
GROUP BY RE.C_road_id, RE.C_event_id, RE.C_year, RE.C_from_meas, RE.C_to_meas, RE.C_road_length
) AS RE
) AS RE
WHERE RE.leftover_length > 0--Filter out Events that had their entire Segments overlapped by a Later Event(s).
ORDER BY RE.road_id, RE.year DESC, RE.event_id
SQL Fiddle:
http://sqlfiddle.com/#!18/2880b/1
Added Rules/Assumptions/Clarifications:
1.) Allow for the possibility event_id
and road_id
could be Guid's or created out-of-order,
so do not script assuming higher or lower values give meaning to the relationship of records.
For Example:
An ID of 1 and and ID of 2 does not guarantee the ID of 2 is the most recent one (and vice-versa).
This is so the solution will be more general and less "hacky".
2.) Filter out Events that had their entire Segments overlapped by a Later Event(s).
For Example:
If 2008 had work on 20-50 and 2009 had work on 10-60,
then the Event for 2008 would be filtered out because its entire Segment was rehashed in 2009.
Additional Test Data:
To ensure solutions are not tailored to only the DataSet given,
I have added a road_id
of 6 to the original DataSet, in order to hit a few more fringe-cases.
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (16,6,2012,0,100,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (17,6,2013,68,69,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (18,6,2014,65,66,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (19,6,2015,62,63,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (20,6,2016,50,60,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (21,6,2017,30,40,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (22,6,2017,20,55,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (23,6,2018,0,25,100);
Results: (with the 8 Additional Records I added in Green)
Database Version:
This Solution is Oracle and SQL-Server Agnostic:
It Should Work in both SS2008+ and Oracle 12c+.
This question is tagged with Oracle 12c, but there is no online-fiddle I may use without signing up,
so I tested it in SQL-Server - but the same syntax should work in both.
I rely on Cross-Apply
and Outer-Apply
for most of my queries.
Oracle introduced these "Joins" in 12c:
https://oracle-base.com/articles/12c/lateral-inline-views-cross-apply-and-outer-apply-joins-12cr1
Simplified and Performant:
This uses:
• No Correlated Subqueries.
• No Recursion.
• No CTE's.
• No Unions.
• No User Functions.
Indexes:
I read in one of your comments you had asked about Indexes.
I would add 1-Column Indexes for each the main Fields you will be searching and grouping on:
road_id
, event_id
, and year
.
You could see if this index would help you any (I don't know how you plan to use the data):
Key Fields: road_id
, event_id
, year
Include: from_meas
, to_meas
Title:
You may want to consider Renaming the Title of this Question to something more searchable like:
"Aggregate Overlapping Segments to Measure Effective Length".
This would allow the solution to be easier to find for helping others with similar problems.
Other Thoughts:
Something like this would be useful in Tallying up the Overall-Time spent on something
with overlapping Start and Stop timestamps.
回答6:
This finds expands the table to produce a row for each mile of each road, and simply takes the MAX
year. We can just then COUNT
the number of rows to produce the event_length.
It produces the table exactly as you specified above.
Note: I ran this query against SQL Server. You could use LEAST
instead of SELECT MIN(event_length) FROM (VALUES...)
in Oracle I think.
WITH NumberRange(result) AS
(
SELECT 0
UNION ALL
SELECT result + 1
FROM NumberRange
WHERE result < 301 --Max length of any road
),
CurrentRoadEventLength(road_id, [year], event_length) AS
(
SELECT road_id, [year], COUNT(*) AS event_length
FROM (
SELECT re.road_id, n.result, MAX(re.[year]) as [year]
FROM road_events re INNER JOIN NumberRange n
ON ( re.from_meas <= n.result
AND re.to_meas > n.result
)
GROUP BY re.road_id, n.result
) events_per_mile
GROUP BY road_id, [year]
)
SELECT re.event_id, re.road_id, re.[year], re.total_road_length,
(SELECT MIN(event_length) FROM (VALUES (re.to_meas - re.from_meas), (cre.event_length)) AS EventLengths(event_length))
FROM road_events re INNER JOIN CurrentRoadEventLength cre
ON ( re.road_id = cre.road_id
AND re.[year] = cre.[year]
)
ORDER BY re.event_id, re.road_id
OPTION (MAXRECURSION 301) --Max length of any road
来源:https://stackoverflow.com/questions/52081473/aggregate-overlapping-segments-to-measure-effective-length