Get total time interval from multiple rows if sequence not broken

后端 未结 2 983
傲寒
傲寒 2021-01-21 19:31

I have Work and Person tables (these are just examples to understand problem).

Structure

Work table

id         


        
相关标签:
2条回答
  • 2021-01-21 19:45

    There may be a way to do this in one SQL select, but it escapes me. I managed to do it with one stored function though. Here's what I did for my testing:

    create table work
    (id integer, start_date date, end_date date);
    
    insert into work values (1, '2011-01-01','2011-02-02');
    insert into work values (1, '2011-02-02','2011-04-04');
    insert into work values (1, '2011-06-06','2011-09-09');
    insert into work values (2, '2011-01-01','2011-02-02');
    insert into work values (2, '2011-02-02','2011-03-03');
    
    create or replace function get_data() returns setof work as
    $body$
    declare
        res work%rowtype;
        sd  date := null;
    begin
        for res in
            select
                w1.id,
                case when exists (select 1 from work w2 where w1.id=w2.id and w2.end_date=w1.start_date) then null else w1.start_date end,
                case when exists (select 1 from work w2 where w1.id=w2.id and w2.start_date=w1.end_date) then null else w1.end_date end
            from
                work w1
            order by
                id, start_date, end_date
        loop
            if res.start_date is not null and res.end_date is not null then
                return next res;
            elsif res.start_date is not null then
                sd := res.start_date;
            elsif res.end_date is not null then
                res.start_date := sd;
                return next res;
            end if;
        end loop;
    
        return;
    end;$body$
      language 'plpgsql';
    

    Then

    select * from get_data() order by id, start_date;
    

    returned this result:

     id | start_date |  end_date
    ----+------------+------------
      1 | 2011-01-01 | 2011-04-04
      1 | 2011-06-06 | 2011-09-09
      2 | 2011-01-01 | 2011-03-03
    (3 rows)
    

    which is, I think, what you're after.

    0 讨论(0)
  • 2021-01-21 20:09

    You could try postgres's WITH RECURSIVE construct. (after all, a linked list is a kind of tree) Getting the boundary conditions right will be a problem, but at least it would solve the problem without the need for loops.

    UPDATE: Added code. The problem with RECURSIVE is that you can only specify only the "tail" boundary condition. To specify the "head" condition, you need to wrap it into a view.

    CREATE VIEW collected_time AS (
        WITH RECURSIVE ztree(person_id, dt_from, dt_to)  AS ( 
        -- Terminal part
        SELECT  pr.person_id, pr.dt_from, pr.dt_to
        FROM prikklok pr
        WHERE NOT EXISTS (
                SELECT * FROM prikklok px
                WHERE px.person_id = pr.person_id AND px.dt_from = pr.dt_to
                )
        UNION
        -- Recursive part
        SELECT  p1.person_id AS person_id
        , p1.dt_from AS dt_from
        , p2.dt_to AS dt_to
        FROM prikklok AS p1
        , ztree AS p2
        WHERE p1.person_id = p2.person_id
        AND p1.dt_to = p2.dt_from
        )
    SELECT *
    FROM ztree zt
    WHERE NOT EXISTS (select *
        FROM prikklok p3
        WHERE p3.person_id = zt.person_id
            AND p3.dt_to = zt.dt_from
                )
        );
    
    SELECT * FROM collected_time;
    

    -- now generate some data with gaps

    INSERT INTO prikklok
        SELECT serie_n
        , serie_t
        , serie_t + '1 month'::interval
    FROM generate_series (1,10) serie_n
        , generate_series ( '1970-01-01 00:00:00' , '2011-09-01 00:00:00' , '1 month' ::interval) serie_t
        ;
    
    DELETE FROM prikklok
    WHERE random() <  0.001
        ;
    
    -- a few indexes won't hurt
    ALTER TABLE prikklok ADD PRIMARY KEY (person_id,dt_from)
        ;
    CREATE UNIQUE INDEX ON prikklok (person_id,dt_to);
    

    The resulting query plan looks perfect:

                                                                                  QUERY PLAN                                                                           
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Nested Loop Anti Join  (cost=1389.73..1469.09 rows=1 width=20) (actual time=13.580..40.920 rows=16 loops=1)
       CTE ztree
         ->  Recursive Union  (cost=0.00..1389.73 rows=11 width=20) (actual time=0.136..27.405 rows=5004 loops=1)
               ->  Merge Anti Join  (cost=0.00..638.92 rows=1 width=20) (actual time=0.130..10.011 rows=16 loops=1)
                     Merge Cond: ((pr.person_id = px.person_id) AND (pr.dt_to = px.dt_from))
                     ->  Index Scan using prikklok_person_id_dt_to_idx on prikklok pr  (cost=0.00..291.31 rows=5004 width=20) (actual time=0.063..2.273 rows=5004 loops=1)
                     ->  Index Scan using prikklok_pkey on prikklok px  (cost=0.00..291.31 rows=5004 width=12) (actual time=0.012..2.204 rows=5004 loops=1)
               ->  Nested Loop  (cost=0.00..75.06 rows=1 width=20) (actual time=0.002..0.027 rows=10 loops=501)
                     ->  WorkTable Scan on ztree p2  (cost=0.00..0.20 rows=10 width=20) (actual time=0.000..0.001 rows=10 loops=501)
                     ->  Index Scan using prikklok_person_id_dt_to_idx on prikklok p1  (cost=0.00..7.47 rows=1 width=20) (actual time=0.002..0.002 rows=1 loops=5004)
                           Index Cond: ((p1.person_id = p2.person_id) AND (p1.dt_to = p2.dt_from))
       ->  CTE Scan on ztree zt  (cost=0.00..0.22 rows=11 width=20) (actual time=0.138..29.887 rows=5004 loops=1)
       ->  Index Scan using prikklok_person_id_dt_to_idx on prikklok p3  (cost=0.00..7.18 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=5004)
             Index Cond: ((p3.person_id = zt.person_id) AND (p3.dt_to = zt.dt_from))
     Total runtime: 41.354 ms
    (15 rows)
    
    0 讨论(0)
提交回复
热议问题