Sum duration of overlapping periods with priority by excluding the overlap itself

前端 未结 2 1428
攒了一身酷
攒了一身酷 2021-01-22 19:16

I have an R code and I am trying to rewrite it in PostgreSQL that feeds grafana dashboard. I do have the basics so I am almost done with the other parts of the script but what I

相关标签:
2条回答
  • 2021-01-22 19:48

    This is a type of gaps-and-islands problem. To solve this, find where the "islands" begin and then aggregate. So, to get the islands:

    select a.name, min(start) as startt, max("end") as endt
    from (select a.*,
                 count(*) filter (where prev_end is null or prev_end < start) over (partition by name order by start, id) as grp
          from (select a.*,
                       max("end") over (partition by name
                                        order by start, id
                                        rows between unbounded preceding and 1 preceding
                                       ) as prev_end
                from activities a
               ) a
         ) a
    group by name, grp;
    

    The next step is just to aggregate again:

    with islands as (
          select a.name, min(start) as startt, max("end") as endt
          from (select a.*,
                       count(*) filter (where prev_end is null or prev_end < start) over (partition by name order by start, id) as grp
                from (select a.*,
                             max("end") over (partition by name
                                              order by start, id
                                              rows between unbounded preceding and 1 preceding
                                             ) as prev_end
                      from activities a
                     ) a
               ) a
          group by name, grp
         )
    select name, sum(endt - startt)
    from islands i
    group by name;
    

    Here is a db<>fiddle.

    Note that this uses a cumulative trailing maximum to define the overlaps. This is the most general method for determining overlaps. I think this will work on all edge cases, including:

    1----------2---2----3--3-----1
    

    It also handles ties on the start time.

    0 讨论(0)
  • 2021-01-22 19:59

    Update My original solution was not correct. The consolidation of ranges cannot be handled in a regular window. I confused myself by using the same name, trange, forgetting that the window is over the source rows rather than the result rows. Please see the updated SQL Fiddle with the full query as well as an added record to illustrate the problem.

    You can simplify the overlapping requirement as well as identifying gaps and islands using PostgreSQL range types.

    The following query is intentionally verbose to show each step of the process. A number of steps can be combined.

    SQL Fiddle

    First, add an inclusive [start, end] range to each record.

    with add_ranges as (
      select id, name, tsrange(start, "end", '[]') as t_range
        from activities
    ), 
    
     id | name |                    t_range                    
    ----+------+-----------------------------------------------
      1 | A    | ["2018-01-09 17:00:00","2018-01-09 20:00:00"]
      2 | A    | ["2018-01-09 18:00:00","2018-01-09 20:30:00"]
      3 | B    | ["2018-01-09 19:00:00","2018-01-09 21:30:00"]
      4 | B    | ["2018-01-09 22:00:00","2018-01-09 23:00:00"]
    (4 rows)
    

    Identify overlapping ranges as determined by the && operator and mark the beginning of new islands with a 1.

    mark_islands as (
      select id, name, t_range,
             case
               when t_range && lag(t_range) over w then 0
               else 1
             end as new_range
        from add_ranges
      window w as (partition by name order by t_range)
    ),
    
     id | name |                    t_range                    | new_range 
    ----+------+-----------------------------------------------+-----------
      1 | A    | ["2018-01-09 17:00:00","2018-01-09 20:00:00"] |         1
      2 | A    | ["2018-01-09 18:00:00","2018-01-09 20:30:00"] |         0
      3 | B    | ["2018-01-09 19:00:00","2018-01-09 21:30:00"] |         1
      4 | B    | ["2018-01-09 22:00:00","2018-01-09 23:00:00"] |         1
    (4 rows)
    
    

    Number the groups based on the sum of the new_range within name.

    group_nums as (
      select id, name, t_range, 
             sum(new_range) over (partition by name order by t_range) as group_num
        from mark_islands
    ),
    
     id | name |                    t_range                    | group_num 
    ----+------+-----------------------------------------------+-----------
      1 | A    | ["2018-01-09 17:00:00","2018-01-09 20:00:00"] |         1
      2 | A    | ["2018-01-09 18:00:00","2018-01-09 20:30:00"] |         1
      3 | B    | ["2018-01-09 19:00:00","2018-01-09 21:30:00"] |         1
      4 | B    | ["2018-01-09 22:00:00","2018-01-09 23:00:00"] |         2
    

    Group by name, group_num to get the total time spent on the island as well as a complete t_range to be used in overlap deduction.

    islands as (
      select name,
             tsrange(min(lower(t_range)), max(upper(t_range)), '[]') as t_range,
             max(upper(t_range)) - min(lower(t_range)) as island_time_interval
        from group_nums
       group by name, group_num
    ),
    
     name |                    t_range                    | island_time_interval 
    ------+-----------------------------------------------+----------------------
     A    | ["2018-01-09 17:00:00","2018-01-09 20:30:00"] | 03:30:00
     B    | ["2018-01-09 19:00:00","2018-01-09 21:30:00"] | 02:30:00
     B    | ["2018-01-09 22:00:00","2018-01-09 23:00:00"] | 01:00:00
    (3 rows)
    
    

    For the requirement to count overlap time between A messages and B messages, find occurrences of when an A message overlaps a B message, and use the * intersect operator to find the intersection.

    priority_overlaps as (
      select b.name, a.t_range * b.t_range as overlap_range
        from islands a
        join islands b
          on a.t_range && b.t_range
         and a.name = 'A' and b.name != 'A'
    ),
    
     name |                 overlap_range                 
    ------+-----------------------------------------------
     B    | ["2018-01-09 19:00:00","2018-01-09 20:30:00"]
    (1 row)
    

    Sum the total time of each overlap by name.

    overlap_time as (
      select name, sum(upper(overlap_range) - lower(overlap_range)) as total_overlap_interval
        from priority_overlaps
       group by name
    ),
    
     name | total_overlap_interval 
    ------+------------------------
     B    | 01:30:00
    (1 row)
    

    Calculate the total time for each name.

    island_times as (
      select name, sum(island_time_interval) as name_time_interval
        from islands
       group by name
    )
    
     name | name_time_interval 
    ------+--------------------
     B    | 03:30:00
     A    | 03:30:00
    (2 rows)
    
    

    Join the total time for each name to adjustments from the overlap_time CTE, and subtract the adjustment for the final duration value.

    select i.name,
           i.name_time_interval - coalesce(o.total_overlap_interval, interval '0') as duration
      from island_times i
      left join overlap_time o
        on o.name = i.name
    ;
    
     name | duration 
    ------+----------
     B    | 02:00:00
     A    | 03:30:00
    (2 rows)
    
    0 讨论(0)
提交回复
热议问题