GROUP BY consecutive dates delimited by gaps

后端 未结 2 1903
忘了有多久
忘了有多久 2021-01-04 10:42

Assume you have (in Postgres 9.1 ) a table like this:

date | value 

which have some gaps in it (I mean: not every possible date between min

相关标签:
2条回答
  • 2021-01-04 11:30
    create table t ("date" date, "value" int);
    insert into t ("date", "value") values
        ('2011-10-31', 2),
        ('2011-11-01', 8),
        ('2011-11-02', 10),
        ('2012-09-13', 1),
        ('2012-09-14', 4),
        ('2012-09-15', 5),
        ('2012-09-16', 20),
        ('2012-10-30', 10);
    

    Simpler and cheaper version:

    select min("date"), max("date"), sum(value)
    from (
        select
            "date", value,
            "date" - (dense_rank() over(order by "date"))::int g
        from t
    ) s
    group by s.g
    order by 1
    

    My first try was more complex and expensive:

    create temporary sequence s;
    select min("date"), max("date"), sum(value)
    from (
        select 
            "date", value, d,
            case 
                when lag("date", 1, null) over(order by s.d) is null and "date" is not null 
                    then nextval('s')
                when lag("date", 1, null) over(order by s.d) is not null and "date" is not null 
                    then lastval()
                else 0 
            end g
        from 
            t
            right join
            generate_series(
                (select min("date") from t)::date, 
                (select max("date") from t)::date + 1, 
                '1 day'
            ) s(d) on s.d::date = t."date"
    ) q
    where g != 0
    group by g
    order by 1
    ;
    drop sequence s;
    

    The output:

        min     |    max     | sum 
    ------------+------------+-----
     2011-10-31 | 2011-11-02 |  20
     2012-09-13 | 2012-09-16 |  30
     2012-10-30 | 2012-10-30 |  10
    (3 rows)
    
    0 讨论(0)
  • 2021-01-04 11:33

    Here is a way of solving it.

    First, to get the beginning of consecutive series, this query would give you the first date:

    SELECT first.date
    FROM raw_data first
         LEFT OUTER JOIN raw_data prior_first ON first.date = prior_first + 1
    WHERE prior_first IS NULL
    

    likewise for the end of consecutive series,

    SELECT last.date
    FROM raw_data last
         LEFT OUTER JOIN raw_data after_last ON last.date = after_last - 1
    WHERE after_last IS NULL
    

    You might consider making these views, to simplify queries using them.

    We only need the first to form group ranges

    CREATE VIEW beginings AS
    SELECT first.date
    FROM raw_data first
         LEFT OUTER JOIN raw_data prior_first ON first.date = prior_first + 1
    WHERE prior_first IS NULL
    
    CREATE VIEW endings AS
    SELECT last.date
    FROM raw_data last
         LEFT OUTER JOIN raw_data after_last ON last.date = after_last - 1
    WHERE after_last IS NULL
    
    SELECT MIN(raw.date), MAX(raw.date), SUM(raw.value)
    FROM raw_data raw
      INNER JOIN (SELECT lo.date AS lo_date, MIN(hi.date) as hi_date
                  FROM beginnings lo, endings hi
                  WHERE lo.date < hi.date
                  GROUP BY lo.date) range
         ON raw.date >= range.lo_date AND raw.date <= range.hi_date
    GROUP BY range.lo_date
    
    0 讨论(0)
提交回复
热议问题