I just can\'t seem to get this query figured out. I need to combine rows of time-consecutive states into a single state.
This question is similar to the question fou
Here's another approach:
SELECT
name,
min(start_inst) AS start_inst,
max(end_inst) AS end_inst,
code,
subcode
FROM
(
SELECT
A.*,
COUNT
(
CASE WHEN start_inst = previous_end_inst THEN NULL
ELSE 1
END
)
OVER
(
ORDER BY
start_inst,
name,
code,
subcode
) AS group_number
FROM
(
SELECT
name,
start_inst,
end_inst,
LAG
(
end_inst
)
OVER
(
PARTITION BY
name,
code,
subcode
ORDER BY
start_inst
) AS previous_end_inst,
code,
subcode
FROM
data
) A
) B
GROUP BY
name,
code,
subcode,
group_number
ORDER BY
group_number
Basically:
For each row, subquery A finds the previous end time for the given name, code, and subcode.
For each row, subquery B calculates the "group number" -- a running count of preceeding rows (in order of start_inst, name, code, and subcode) where the previous end time calculated in Step 1 is not equal to the start time.
The outer query aggregates by group number.
For better or worse, this approach, unlike @stevo's, will create a new "group" if there's a "gap" between the end time of one record and the start time of the next. For example, if you were to create a gap between 12:57 and 13:00 like this...
UPDATE data
SET start_inst = TO_DATE('9/12/2011 13:00', 'MM/DD/YYYY HH24:MI')
WHERE start_inst = TO_DATE('9/12/2011 12:57', 'MM/DD/YYYY HH24:MI');
...the query above would return two rows like this...
NAME START_INST END_INST CODE SUBCODE
-------------------- ---------------- ---------------- ---------- ----------
.
.
.
Person1 09/12/2011 12:26 09/12/2011 12:57 161 71
Person1 09/12/2011 13:00 09/12/2011 13:07 161 71
.
.
.
...whereas @stevo's query would return one row like this...
NAME START_INST END_INST CODE SUBCODE
-------------------- ---------------- ---------------- ---------- ----------
.
.
.
Person1 12/09/2011 12:26 12/09/2011 13:07 161 71
.
.
.
Hope this helps.
adapting desm's query, I think this should work
WITH
sequenced_data AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY name ORDER BY start_inst) NameSequenceID,
ROW_NUMBER() OVER (PARTITION BY name, code, subcode ORDER BY start_inst) NameStateSequenceID,
d.*
FROM
data d
)
SELECT
name,
to_char(MIN(start_inst),'DD/MM/YYYY HH24:MI') start_inst,
to_char(MAX(end_inst),'DD/MM/YYYY HH24:MI') end_inst,
code,
subcode
FROM
sequenced_data
GROUP BY
name,
code,
subcode,
NameSequenceID - NameStateSequenceID
ORDER BY name,start_inst
You can do that with a recursive query (something with CONNECT BY / PRIOR in oracle, IIRC) I did the same thing for Postgres in this thread : Get total time interval from multiple rows if sequence not broken
It might need a bit of reworking to make it fit into the oracle syntax.
Here is a solution using a recursive query instead of analytic functions (as suggested by @wildplasser):
SELECT name, code, subcode, MIN(start_inst) AS start_inst, MAX(end_inst) AS end_inst
FROM (SELECT name,
start_inst,
end_inst,
code,
subcode,
MIN(CONNECT_BY_ROOT (start_inst)) AS root_start
FROM data d
CONNECT BY PRIOR name = name
AND PRIOR end_inst = start_inst
AND PRIOR code = code
AND PRIOR subcode = subcode
GROUP BY name, start_inst, end_inst, code, subcode)
GROUP BY name, code, subcode, root_start;
The connect by
clause in the innermost query causes the data to be returned in a hierarchical fashion. connect_by_root
gives us the value at the root of each branch. Because we don't have a good candidate for a start with
clause, we'll get all child rows (where end_inst
equals another row's start_inst
and all other columns are the same) multiple times: once as a root and once (or more) as a branch. Taking the min
of the root eliminates these extra rows while giving us a value to group on in the outer query.
In the outer query, we perform another group by
to consolidate the rows. The difference is that, in this case, we have root_start
there as well to identify which rows are consecutive and therefore need to be consolidated.
Maybe this? (I don't have a SQL machine to run it on)
WITH
sequenced_data AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY name ORDER BY start_inst) NameSequenceID,
ROW_NUMBER() OVER (PARTITION BY name, code, subcode ORDER BY start_inst) NameStateSequenceID,
*
FROM
data
)
SELECT
name,
MIN(start_inst) start_inst,
MAX(end_inst) end_inst,
code,
subcode
FROM
sequenced_data
GROUP BY
name,
code,
subcode,
NameSequenceID - NameStateSequenceID