SQL Server - cumulative sum on overlapping data - getting date that sum reaches a given value

后端 未结 3 1625
一整个雨季
一整个雨季 2021-01-19 01:52

In our company, our clients perform various activities that we log in different tables - Interview attendance, Course Attendance, and other general activities. I have a data

3条回答
  •  北荒
    北荒 (楼主)
    2021-01-19 02:14

    A Geometric Approach

    For another issue, I've taken a geometric approach to date packing. Namely, I convert dates and times to a sql geometry type and utilize geometry::UnionAggregate to merge the ranges.

    I don't believe this will work in sql-server 2005. But your problem was such an interesting puzzle that I wanted to see whether the geometrical approach would work. So any future users running into this problem that have access to a later version can consider it.

    Code Description

    In 'numbers':

    • I build a table representing a sequence
    • Swap it out with your favorite way to make a numbers table.
    • For a union operation, you won't ever need more rows than in your original table, so I just use it as the base to build it.

    In 'mergeLines':

    • I convert the dates to floats and use those floats to create geometrical points.
    • I then connect these points via STUnion and STEnvelope.
    • Finally, I merge all these lines via UnionAggregate. The resulting 'lines' geometry object might contain multiple lines, but if they overlap, they turn into one line.

    In 'redate':

    • I use the numbers CTE to extract the individual lines inside 'lines'.
    • I envelope the lines which here ensures that the lines are stored only as its two endpoints.
    • I read the endpoint x values and convert them back to their time representations (This is usually the end goal, but you need more).
    • I calculate the difference in minutes between activity start and end dates (I do this first in seconds then divide by 60 for the sake of a precision issue).
    • I calculate the cumulative sume of these minutes for each row.

    In the outer query:

    • I align the previous cumulative minutes sum with each current row
    • I filter for the row where the 5hr goal was met but where the previous minutes shows that the 5hr goal for the previous row was not met.
    • I then calculate where in the current row's range the user has met the 5 hours, to not only arrive at the date the five hour goal was met, but the exact time.

    The Code

    with
    
        numbers as (
    
            select  row_number() over (order by (select null)) i 
            from    @activities -- where I put your data
    
        ),
    
        mergeLines as (
    
            select      activity_client_id,
                        lines = geometry::UnionAggregate(line)
            from        @activities
            cross apply (select 
                            startP = geometry::Point(convert(float,activity_start_date), 0, 0),
                            stopP = geometry::Point(convert(float,activity_end_date), 0, 0)
                        ) pointify
            cross apply (select line = startP.STUnion(stopP).STEnvelope()) lineify
            group by    activity_client_id
    
        ),
    
        redate as (
    
            select      client_id = activity_client_id, 
                        activities_start_date,
                        activities_end_date,
                        minutes,
    
                        rollingMinutes = sum(minutes) over(
                            partition by activity_client_id 
                            order by activities_start_date 
                            rows between unbounded preceding and current row
                        )
    
            from        mergeLines ml
            join        numbers n on n.i between 1 and ml.lines.STNumGeometries()
            cross apply (select line = ml.lines.STGeometryN(i).STEnvelope()) l
            cross apply (select 
                            activities_start_date = convert(datetime, l.line.STPointN(1).STX),
                            activities_end_date = convert(datetime, l.line.STPointN(3).STX)
                        ) unprepare
            cross apply (select minutes = 
                            round(datediff(s, activities_start_date, activities_end_date) / 60.0,0)
                        ) duration
    
        )
    
        select      client_id,
                    activities_start_date,
                    activities_end_date,
                    met_5hr_goal = dateadd(minute, (60 * 5) - prevRoll, activities_start_date) 
        from        (
                        select  *,
                                prevRoll = lag(rollingMinutes) over (
                                    partition by client_id 
                                    order by rollingMinutes
                                )
                        from    redate 
                    ) ranker
        where       rollingMinutes >= 60 * 5
        and         prevRoll < 60 * 5;
    

提交回复
热议问题