问题
I need to merge overlapping periods (defined by FROM and TO variables) of sequential events (with identifier NUM) for each group (ID) with a "lookahead buffer", meaning that if the next period starts within the buffer zone, they should be merged.
For instance; in the following example the second event (NUM = 2) starts at time 13, which is within the buffer zone (10 + 5 = 15).
The tricky part here compared to other similar problems I've found is that although the buffer period has a fixed value for each event, this could potentially change if it is merged with an event (only backwards) that has a longer buffer period.
For instance; Event three is also merged to the same periods as event 1 and 2, and because the buffer periods of these events are longer. The following buffer zone should instead be (25 + 5 = 30), rather than (25 + 3 = 28), meaning the following event 4 should also be included in these periods as well.
Once again the buffer period of event 4 is also changed to 5. However, because 40 > 31+5, the last event is a separate observation.
CREATE TABLE MY_TABLE(ID INTEGER, NUM INTEGER, FROM INTEGER, TO INTEGER, LOOKAHEAD INTEGER);
INSERT INTO MY_TABLE VALUES (1, 1, 1, 10, 5);
INSERT INTO MY_TABLE VALUES (1, 2, 13, 20, 5);
INSERT INTO MY_TABLE VALUES (1, 3, 21, 25, 3);
INSERT INTO MY_TABLE VALUES (1, 4, 29, 31, 3);
INSERT INTO MY_TABLE VALUES (1, 5, 40, 50, 3);
Eventually, the result I need are two observations with the two "disjunct" periods;
(ID = 1, FROM = 1, TO = 31)
(ID = 5, FROM = 40, TO = 50)
Naturally I initially thought I could create this "LOOKHEAD"-variable, by creating a new variable LOOKAHEAD2 that is the maximum of previous value of LOOKAHEAD2 and current value of LOOKAHEAD, conditional on FROM(this record) < (TO + LOOKAHEAD)(previous record) using OLAP functions. This doesn't really work however since it is a reference to itself...
Instead, I tried using recursive queries, where I start with the first event (NUM = 1)
, and than recursively join the table with the next event (root.NUM+1 = next.NUM)
conditional on (root.TO + root.LOOKAHEAD > next.FROM)
, and also updating the LOOKAHEAD variable accordingly.
But I have never used recursive queries before, and I can't get it to join on the updated value of the LOOKAHEAD-value.
Does anyone know how to solve this with either recursive queries or other?
回答1:
You should use the RESET WHEN
window modifier in your analytic functions (LAG
in Teradata 16, or MAX
in earlier ones); don't use a recursive query.
Update:
DROP TABLE MY_TABLE;
CREATE VOLATILE TABLE MY_TABLE
( id INTEGER
, num INTEGER
, from_value INTEGER
, to_value INTEGER
, lookahead INTEGER
) ON COMMIT PRESERVE ROWS;
INSERT INTO MY_TABLE VALUES (1, 1, 1, 10, 5);
INSERT INTO MY_TABLE VALUES (1, 2, 13, 20, 5);
INSERT INTO MY_TABLE VALUES (1, 3, 21, 25, 3);
INSERT INTO MY_TABLE VALUES (1, 4, 29, 31, 3);
INSERT INTO MY_TABLE VALUES (1, 5, 40, 50, 3);
INSERT INTO MY_TABLE VALUES (2, 1, 1, 10, 5);
INSERT INTO MY_TABLE VALUES (2, 2, 20, 30, 15);
INSERT INTO MY_TABLE VALUES (2, 3, 40, 41, 5);
INSERT INTO MY_TABLE VALUES (2, 4, 100, 200, 5);
INSERT INTO MY_TABLE VALUES (2, 5, 300, 400, 3);
SELECT id, first_from_value, to_value
FROM ( SELECT id
, to_value
, CASE WHEN overlaps_flag = 1
THEN NULL
ELSE COALESCE
( MIN (from_value)
OVER (PARTITION BY id
ORDER BY from_value
RESET WHEN MAX (overlaps_flag)
OVER (PARTITION BY id
ROWS BETWEEN
1 PRECEDING
AND 1 PRECEDING) = 0
ROWS BETWEEN UNBOUNDED PRECEDING
AND 1 PRECEDING)
, from_value )
END AS first_from_value
FROM ( SELECT id, from_value, to_value
, MAX (from_value)
OVER (PARTITION BY id
ORDER BY from_value
ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING)
AS next_from_value
, CASE WHEN to_value + lookahead + 1 >= next_from_value
THEN 1 ELSE 0
END AS overlaps_flag
FROM my_table
) AS a
) AS a
WHERE first_from_value IS NOT NULL
ORDER BY 1, 2
id first_from_value to_value
1 1 31
1 40 50
2 1 10
2 20 41
2 100 200
2 300 400
来源:https://stackoverflow.com/questions/51879557/aggregate-periods-using-recursive-queries