Combine rows when the end time of one is the start time of another (Oracle)

前端 未结 5 433
旧时难觅i
旧时难觅i 2021-01-18 15:49

I just can\'t seem to get this query figured out. I need to combine rows of time-consecutive states into a single state.

This question is similar to the question fou

相关标签:
5条回答
  • 2021-01-18 16:13

    Here's another approach:

    SELECT
        name,
        min(start_inst) AS start_inst,
        max(end_inst) AS end_inst,
        code,
        subcode
    FROM
        (
            SELECT
                A.*,
                COUNT
                (
                    CASE WHEN start_inst = previous_end_inst THEN NULL
                    ELSE 1
                    END
                )
                OVER
                (
                    ORDER BY
                        start_inst,
                        name,
                        code,
                        subcode
                ) AS group_number
            FROM
                (
                    SELECT
                        name,
                        start_inst,
                        end_inst,
                        LAG
                        (
                          end_inst
                        )
                        OVER
                        (
                            PARTITION BY
                                name,
                                code,
                                subcode
                            ORDER BY
                                start_inst
                        ) AS previous_end_inst,
                        code,
                        subcode
                    FROM
                        data
                ) A
            ) B
    GROUP BY
        name,
        code,
        subcode,
        group_number
    ORDER BY
        group_number
    

    Basically:

    1. For each row, subquery A finds the previous end time for the given name, code, and subcode.

    2. For each row, subquery B calculates the "group number" -- a running count of preceeding rows (in order of start_inst, name, code, and subcode) where the previous end time calculated in Step 1 is not equal to the start time.

    3. The outer query aggregates by group number.

    For better or worse, this approach, unlike @stevo's, will create a new "group" if there's a "gap" between the end time of one record and the start time of the next. For example, if you were to create a gap between 12:57 and 13:00 like this...

    UPDATE data
    SET start_inst = TO_DATE('9/12/2011 13:00', 'MM/DD/YYYY HH24:MI')
    WHERE start_inst = TO_DATE('9/12/2011 12:57', 'MM/DD/YYYY HH24:MI');
    

    ...the query above would return two rows like this...

    NAME                 START_INST       END_INST               CODE    SUBCODE
    -------------------- ---------------- ---------------- ---------- ----------
    .
    .
    .
    Person1              09/12/2011 12:26 09/12/2011 12:57        161         71
    Person1              09/12/2011 13:00 09/12/2011 13:07        161         71
    .
    .
    .
    

    ...whereas @stevo's query would return one row like this...

    NAME                 START_INST       END_INST               CODE    SUBCODE
    -------------------- ---------------- ---------------- ---------- ----------
    .
    .
    .
    Person1              12/09/2011 12:26 12/09/2011 13:07        161         71
    .
    .
    .
    

    Hope this helps.

    0 讨论(0)
  • 2021-01-18 16:17

    adapting desm's query, I think this should work

    WITH
      sequenced_data AS
    (
    SELECT
    ROW_NUMBER() OVER (PARTITION BY name                ORDER BY start_inst) NameSequenceID,
    ROW_NUMBER() OVER (PARTITION BY name, code, subcode ORDER BY start_inst)     NameStateSequenceID,
    d.*
    FROM
    data d
    ) 
    SELECT
      name,
      to_char(MIN(start_inst),'DD/MM/YYYY HH24:MI') start_inst,
      to_char(MAX(end_inst),'DD/MM/YYYY HH24:MI')   end_inst,
      code,
      subcode
    FROM
      sequenced_data
    GROUP BY
      name,
      code,
      subcode,
      NameSequenceID - NameStateSequenceID
    ORDER BY name,start_inst
    
    0 讨论(0)
  • 2021-01-18 16:23

    You can do that with a recursive query (something with CONNECT BY / PRIOR in oracle, IIRC) I did the same thing for Postgres in this thread : Get total time interval from multiple rows if sequence not broken

    It might need a bit of reworking to make it fit into the oracle syntax.

    0 讨论(0)
  • 2021-01-18 16:24

    Here is a solution using a recursive query instead of analytic functions (as suggested by @wildplasser):

    SELECT   name, code, subcode, MIN(start_inst) AS start_inst, MAX(end_inst) AS end_inst
    FROM     (SELECT     name,
                         start_inst,
                         end_inst,
                         code,
                         subcode,
                         MIN(CONNECT_BY_ROOT (start_inst)) AS root_start
              FROM       data d
              CONNECT BY PRIOR name = name 
                     AND PRIOR end_inst = start_inst 
                     AND PRIOR code = code 
                     AND PRIOR subcode = subcode
              GROUP BY   name, start_inst, end_inst, code, subcode)
    GROUP BY name, code, subcode, root_start;
    

    The connect by clause in the innermost query causes the data to be returned in a hierarchical fashion. connect_by_root gives us the value at the root of each branch. Because we don't have a good candidate for a start with clause, we'll get all child rows (where end_inst equals another row's start_inst and all other columns are the same) multiple times: once as a root and once (or more) as a branch. Taking the min of the root eliminates these extra rows while giving us a value to group on in the outer query.

    In the outer query, we perform another group by to consolidate the rows. The difference is that, in this case, we have root_start there as well to identify which rows are consecutive and therefore need to be consolidated.

    0 讨论(0)
  • 2021-01-18 16:27

    Maybe this? (I don't have a SQL machine to run it on)

    WITH
      sequenced_data AS
    (
      SELECT
        ROW_NUMBER() OVER (PARTITION BY name                ORDER BY start_inst) NameSequenceID,
        ROW_NUMBER() OVER (PARTITION BY name, code, subcode ORDER BY start_inst) NameStateSequenceID,
        *
      FROM
        data
    )
    SELECT
      name,
      MIN(start_inst) start_inst,
      MAX(end_inst)   end_inst,
      code,
      subcode
    FROM
      sequenced_data
    GROUP BY
      name,
      code,
      subcode,
      NameSequenceID - NameStateSequenceID
    
    0 讨论(0)
提交回复
热议问题