How can I identify groups of consecutive dates in SQL?

后端 未结 2 2052
夕颜
夕颜 2021-02-14 04:54

Im trying to write a function which identifies groups of dates, and measures the size of the group.

I\'ve been doing this procedurally in Python until now but I\'d like

相关标签:
2条回答
  • 2021-02-14 05:26

    You can do this with a clever application of window functions. Consider the following:

    select name, date, row_number() over (partition by name order by date)
    from t
    

    This adds a row number, which in your example would simply be 1, 2, 3, 4, 5. Now, take the difference from the date, and you have a constant value for the group.

    select name, date,
           dateadd(d, - row_number() over (partition by name order by date), date) as val
    from t
    

    Finally, you want the number of groups in sequence. I would also add a group identifier (for instance, to distinguish between the last two).

    select name, date,
           count(*) over (partition by name, val) as NumInSeq,
           dense_rank() over (partition by name order by val) as SeqID
    from (select name, date,
                 dateadd(d, - row_number() over (partition by name order by date), date) as val
          from t
         ) t
    

    Somehow, I missed the part about weekdays and holidays. This solution does not solve that problem.

    0 讨论(0)
  • 2021-02-14 05:33

    The following query account the weekends and holidays. The query has a provision to include the holidays on-the-fly, though for the purpose of making the query clearer, I just materialized the holidays to an actual table.

    CREATE TABLE tx
        (n varchar(4), d date);
    
    INSERT INTO tx
        (n, d)
    VALUES
        ('Bill', '2006-12-29'), -- Friday    
        -- 2006-12-30 is Saturday
        -- 2006-12-31 is Sunday
    
        -- 2007-01-01 is New Year's Holiday    
        ('Bill', '2007-01-02'), -- Tuesday
        ('Bill', '2007-01-03'), -- Wednesday
        ('Bill', '2007-01-04'), -- Thursday
        ('Bill', '2007-01-05'), -- Friday    
        -- 2007-01-06 is Saturday
        -- 2007-01-07 is Sunday
    
        ('Bill', '2007-01-08'), -- Monday
        ('Bill', '2007-01-09'), -- Tuesday
    
        ('Bill', '2012-07-09'), -- Monday
        ('Bill', '2012-07-10'), -- Tuesday
        ('Bill', '2012-07-11'); -- Wednesday
    
    create table holiday(d date);
    insert into holiday(d) values
    ('2007-01-01');
    
    
    /* query should return 7 consecutive good 
       attendance(from December 29 2006 to January 9 2007) */    
    /* and 3 consecutive attendance from July 7 2012 to July 11 2012. */
    

    Query:

    with first_date as
    (
        -- get the monday of the earliest date
        select dateadd( ww, datediff(ww,0,min(d)), 0 ) as first_date 
        from tx 
    )
    ,shifted as
    (
        select 
            tx.n, tx.d,                     
            diff = datediff(day, fd.first_date, tx.d) 
                       - (datediff(day, fd.first_date, tx.d)/7 * 2)             
        from tx
        cross join first_date fd
        union
        select 
            xxx.n, h.d,             
            diff = datediff(day, fd.first_date, h.d) 
                       - (datediff(day, fd.first_date, h.d)/7 * 2) 
        from holiday h 
        cross join first_date fd
        cross join (select distinct n from tx) as xxx
    )
    ,grouped as
    (
        select *, grp = diff - row_number() over(partition by n order by d)
        from shifted
    )
    select 
        d, n, dense_rank() over (partition by n order by grp) as nth_streak
        ,count(*) over (partition by n, grp) as streak
    from grouped
    where d not in (select d from holiday)  -- remove the holidays
    

    Output:

    |          D |    N | NTH_STREAK | STREAK |
    -------------------------------------------
    | 2006-12-29 | Bill |          1 |      7 |
    | 2007-01-02 | Bill |          1 |      7 |
    | 2007-01-03 | Bill |          1 |      7 |
    | 2007-01-04 | Bill |          1 |      7 |
    | 2007-01-05 | Bill |          1 |      7 |
    | 2007-01-08 | Bill |          1 |      7 |
    | 2007-01-09 | Bill |          1 |      7 |
    | 2012-07-09 | Bill |          2 |      3 |
    | 2012-07-10 | Bill |          2 |      3 |
    | 2012-07-11 | Bill |          2 |      3 |
    

    Live test: http://www.sqlfiddle.com/#!3/815c5/1

    The main logic of the query is to shift all the dates two days back. This is done by dividing the date to 7 and multiplying it by two, then subtracting it from the original number. For example, if a given date falls on 15th, this will be computed as 15/7 * 2 == 4; then subtract 4 from the original number, 15 - 4 == 11. 15 will become the 11th day. Likewise the 8th day becomes the 6th day; 8 - (8/7 * 2) == 6.

                      Weekends are not in attendance(e.g. 6,7,13,14)
     1  2  3  4  5     6  7
     8  9 10 11 12    13 14
    15
    

    Applying the computation to all the weekday numbers will yield these values:

     1  2  3  4  5    
     6  7  8  9 10    
    11
    

    For holidays, you need to slot them on attendance, so to the consecutive-ness could be easily determined, then just remove them from the final query. The above attendance yields 11 consecutive good attendance.

    Query logic's detailed explanation here: http://www.ienablemuch.com/2012/07/monitoring-perfect-attendance.html

    0 讨论(0)
提交回复
热议问题