How to add a running count to rows in a 'streak' of consecutive days

匆匆过客 提交于 2019-11-30 22:23:45

Building on this table (not using the SQL keyword "date" as column name.):

CREATE TABLE tbl(
  pid int
, the_date date
, PRIMARY KEY (pid, the_date)
);

Query:

SELECT pid, the_date
     , row_number() OVER (PARTITION BY pid, grp ORDER BY the_date) AS in_streak
FROM  (
   SELECT *
        , the_date - '2000-01-01'::date
        - row_number() OVER (PARTITION BY pid ORDER BY the_date) AS grp
   FROM   tbl
) sub
ORDER  BY pid, the_date;

Subtracting a date from another date yields an integer. Since you are looking for consecutive days, every next row would be greater by one. If we subtract row_number() from that, the whole streak ends up in the same group (grp) per pid. Then it's simple to deal out number per group.

grp is calculated with two subtractions, which should be fastest. An equally fast alternative could be:

the_date - row_number() OVER (PARTITION BY pid ORDER BY the_date) * interval '1d' AS grp

One multiplication, one subtraction. String concatenation and casting is more expensive. Test with EXPLAIN ANALYZE.

Don't forget to partition by pid additionally in both steps, or you'll inadvertently mix groups that should be separated.

Using a subquery, since that is typically faster than a CTE. There is nothing here that a plain subquery couldn't do.

And since you mentioned it: dense_rank() is obviously not necessary here. Basic row_number() does the job.

You'll get more attention if you include CREATE TABLE statements and INSERT statements in your question.

create table test (
  pid integer not null,
  date date not null,
  primary key (pid, date)
);

insert into test values
(1,'2014-10-1'), (1,'2014-10-2'), (1,'2014-10-3'), (1,'2014-10-5'),
(1,'2014-10-7'), (2,'2014-10-1'), (2,'2014-10-2'), (2,'2014-10-3'),
(2,'2014-10-5'), (2,'2014-10-7');

The principle is simple. A streak of distinct, consecutive dates minus row_number() is a constant. You can group by the constant, and take the dense_rank() over that result.

with grouped_dates as (
  select pid, date, 
         (date - (row_number() over (partition by pid order by date) || ' days')::interval)::date as grouping_date
  from test
)
select * , dense_rank() over (partition by grouping_date order by date) as in_streak
from grouped_dates
order by pid, date
pid  date         grouping_date  in_streak
--
1    2014-10-01   2014-09-30     1
1    2014-10-02   2014-09-30     2
1    2014-10-03   2014-09-30     3
1    2014-10-05   2014-10-01     1
1    2014-10-07   2014-10-02     1
2    2014-10-01   2014-09-30     1
2    2014-10-02   2014-09-30     2
2    2014-10-03   2014-09-30     3
2    2014-10-05   2014-10-01     1
2    2014-10-07   2014-10-02     1
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!