SQL Server - cumulative sum on overlapping data - getting date that sum reaches a given value

后端 未结 3 1626
一整个雨季
一整个雨季 2021-01-19 01:52

In our company, our clients perform various activities that we log in different tables - Interview attendance, Course Attendance, and other general activities. I have a data

3条回答
  •  臣服心动
    2021-01-19 02:04

    This is one way to do it:

    ;WITH CTErn AS (
       SELECT activity_client_id, activity_type,
              activity_start_date, activity_end_date,
              ROW_NUMBER() OVER (PARTITION BY activity_client_id 
                                 ORDER BY activity_start_date) AS rn
       FROM activities
    ),   
    CTEdiff AS (
       SELECT c1.activity_client_id, c1.activity_type,
              x.activity_start_date, c1.activity_end_date,
              DATEDIFF(mi, x.activity_start_date, c1.activity_end_date) AS diff,
              ROW_NUMBER() OVER (PARTITION BY c1.activity_client_id 
                                 ORDER BY x.activity_start_date) AS seq
       FROM CTErn AS c1
       LEFT JOIN CTErn AS c2 ON c1.rn = c2.rn + 1
       CROSS APPLY (SELECT CASE 
                              WHEN c1.activity_start_date < c2.activity_end_date
                                 THEN c2.activity_end_date
                              ELSE c1.activity_start_date
                           END) x(activity_start_date)    
    )
    SELECT TOP 1 client_id, client_sign_up_date, activity_start_date, 
                 hoursOfActivicty               
    FROM CTEdiff AS c1
    INNER JOIN clients AS c2 ON c1.activity_client_id = c2.client_id                     
    CROSS APPLY (SELECT SUM(diff) / 60.0
                 FROM CTEdiff AS c3
                 WHERE c3.seq <= c1.seq) x(hoursOfActivicty)
    WHERE hoursOfActivicty >= 5
    ORDER BY seq
    

    Common Table Expressions and ROW_NUMBER() were introduced with SQL Server 2005, so the above query should work for that version.

    Demo here

    The first CTE, i.e. CTErn, produces the following output:

    client_id   activity_type   start_date          end_date          rn
    112         Interview       2015-06-01 09:00    2015-06-01 11:00  1
    112         CV updating     2015-06-01 09:30    2015-06-01 11:30  2
    112         Course          2015-06-02 09:00    2015-06-02 16:00  3
    112         Interview       2015-06-03 09:00    2015-06-03 10:00  4
    

    The second CTE, i.e. CTEdiff, uses the above table expression in order to calculate time difference for each record, taking into consideration any overlapps with the previous record:

    client_id activity_type start_date       end_date         diff  seq
    112       Interview     2015-06-01 09:00 2015-06-01 11:00 120   1
    112       CV updating   2015-06-01 11:00 2015-06-01 11:30 30    2
    112       Course        2015-06-02 09:00 2015-06-02 16:00 420   3
    112       Interview     2015-06-03 09:00 2015-06-03 10:00 60    4
    

    The final query calculates the cumulative sum of time difference and selects the first record that exceeds 5 hours of activity.

    The above query will work for simple interval overlaps, i.e. when just the end date of an activity overlaps the start date of the next activity.

提交回复
热议问题