How to merge time intervals in SQL Server

前端 未结 7 1705
再見小時候
再見小時候 2021-01-03 01:43

Suppose I have the following an event table with personId, startDate and endDate.

I want to know how much time the person X sp

相关标签:
7条回答
  • 2021-01-03 02:13

    Algebra. If B-n is the ending time of the nth event, and A-n is the starting time of the nth event, then the sum of the differences is the difference of the sums. So you can write

    select everything else, sum(cast(endDate as int)) - sum(cast(startDate as int)) as daysSpent
    

    If your dates have no time component, this works. Otherwise, you could use a real.

    0 讨论(0)
  • 2021-01-03 02:21

    Edit 1: I have modified both solutions to get correct results.

    Edit 2: I have done comparative tests using the solutions proposed by Mikael Eriksson, Conrad Frix, Philip Kelley and me. All tests use an EventTable with the following structure:

    CREATE TABLE EventTable
    (
         EventID    INT IDENTITY PRIMARY KEY
        ,PersonId   INT NOT NULL
        ,StartDate  DATETIME NOT NULL
        ,EndDate    DATETIME NOT NULL
        ,CONSTRAINT CK_StartDate_Before_EndDate CHECK(StartDate < EndDate)
    );
    

    Also, all tests use warm buffer (no DBCC DROPCLEANBUFFERS) and cold [plan] cache (I have executed DBCC FREEPROCCACHE before every test). Because some solutions use a filter(PersonId = 1) and others not, I have inserted into EventTable rows for only one person (INSERT ...(PersonId,...) VALUES (1,...)).

    These are the results: enter image description here

    My solutions use recursive CTEs.

    Solution 1:

    WITH BaseCTE
    AS
    (
        SELECT   e.StartDate
                ,e.EndDate
                ,e.PersonId
                ,ROW_NUMBER() OVER(PARTITION BY e.PersonId ORDER BY e.StartDate, e.EndDate) RowNumber
        FROM    EventTable e
    ),  RecursiveCTE
    AS
    (
        SELECT   b.PersonId
                ,b.RowNumber
    
                ,b.StartDate
                ,b.EndDate
                ,b.EndDate AS MaxEndDate
                ,1 AS PseudoDenseRank
        FROM    BaseCTE b
        WHERE   b.RowNumber = 1
        UNION ALL
        SELECT   crt.PersonId
                ,crt.RowNumber
    
                ,crt.StartDate
                ,crt.EndDate
                ,CASE WHEN crt.EndDate > prev.MaxEndDate THEN crt.EndDate ELSE prev.MaxEndDate END
                ,CASE WHEN crt.StartDate <= prev.MaxEndDate THEN prev.PseudoDenseRank ELSE prev.PseudoDenseRank + 1 END
        FROM    RecursiveCTE prev
        INNER JOIN BaseCTE crt ON prev.PersonId = crt.PersonId
        AND     prev.RowNumber + 1 = crt.RowNumber
    ),  SumDaysPerPersonAndInterval
    AS
    (
        SELECT   src.PersonId
                ,src.PseudoDenseRank --Interval ID
                ,DATEDIFF(DAY, MIN(src.StartDate), MAX(src.EndDate)) Days
        FROM    RecursiveCTE src
        GROUP BY src.PersonId, src.PseudoDenseRank
    )
    SELECT  x.PersonId, SUM( x.Days ) DaysPerPerson
    FROM    SumDaysPerPersonAndInterval x
    GROUP BY x.PersonId
    OPTION(MAXRECURSION 32767);
    

    Solution 2:

    DECLARE @Base TABLE --or a temporary table: CREATE TABLE #Base (...) 
    (
         PersonID   INT NOT NULL
        ,StartDate  DATETIME NOT NULL
        ,EndDate    DATETIME NOT NULL
        ,RowNumber  INT NOT NULL
        ,PRIMARY KEY(PersonID, RowNumber)
    );
    INSERT  @Base (PersonID, StartDate, EndDate, RowNumber)
    SELECT   e.PersonId
            ,e.StartDate
            ,e.EndDate
            ,ROW_NUMBER() OVER(PARTITION BY e.PersonID ORDER BY e.StartDate, e.EndDate) RowNumber
    FROM    EventTable e;
    
    WITH RecursiveCTE
    AS
    (
        SELECT   b.PersonId
                ,b.RowNumber
    
                ,b.StartDate
                ,b.EndDate
                ,b.EndDate AS MaxEndDate
                ,1 AS PseudoDenseRank
        FROM    @Base b
        WHERE   b.RowNumber = 1
        UNION ALL
        SELECT   crt.PersonId
                ,crt.RowNumber
    
                ,crt.StartDate
                ,crt.EndDate
                ,CASE WHEN crt.EndDate > prev.MaxEndDate THEN crt.EndDate ELSE prev.MaxEndDate END
                ,CASE WHEN crt.StartDate <= prev.MaxEndDate THEN prev.PseudoDenseRank ELSE prev.PseudoDenseRank + 1 END
        FROM    RecursiveCTE prev
        INNER JOIN @Base crt ON prev.PersonId = crt.PersonId
        AND     prev.RowNumber + 1 = crt.RowNumber
    ),  SumDaysPerPersonAndInterval
    AS
    (
        SELECT   src.PersonId
                ,src.PseudoDenseRank --Interval ID
                ,DATEDIFF(DAY, MIN(src.StartDate), MAX(src.EndDate)) Days
        FROM    RecursiveCTE src
        GROUP BY src.PersonId, src.PseudoDenseRank
    )
    SELECT  x.PersonId, SUM( x.Days ) DaysPerPerson
    FROM    SumDaysPerPersonAndInterval x
    GROUP BY x.PersonId
    OPTION(MAXRECURSION 32767);
    
    0 讨论(0)
  • 2021-01-03 02:25

    The following SQL is for the three scenarios you've described

    with sampleData 
    AS (
    
    
        SELECT       1 personid,1 startDate,4 endDate
        UNION SELECT 1,3,5
        UNION SELECT 2,1,3
        UNION SELECT 2,6,9
        UNION SELECT 3,1,5 
        UNION SELECT 3,4,8
        UNION SELECT 3,11, 15
    
    ), 
         cte 
         AS (SELECT personid, 
                    startdate, 
                    enddate, 
                    Row_number() OVER(ORDER BY personid, startdate) AS rn 
             FROM   sampledata), 
         overlaps 
         AS (SELECT a.personid, 
                    a.startdate, 
                    b.enddate, 
                    a.rn id1, 
                    b.rn id2 
             FROM   cte a 
                    INNER JOIN cte b 
                      ON a.personid = b.personid 
                         AND a.enddate > b.startdate 
                         AND a.rn = b.rn - 1), 
         nooverlaps 
         AS (SELECT a.personid, 
                    a.startdate, 
                    a.enddate 
             FROM   cte a 
                    LEFT JOIN overlaps b 
                      ON a.rn = b.id1 
                          OR a.rn = b.id2 
             WHERE  b.id1 IS NULL) 
    SELECT personid, 
           SUM(timespent) timespent 
    FROM   (SELECT personid, 
                   enddate - startdate timespent 
            FROM   nooverlaps 
            UNION 
            SELECT personid, 
                   enddate - startdate 
            FROM   overlaps) t 
    GROUP  BY personid 
    

    Produces this result

    Personid    timeSpent
    ----------- -----------
    1           4
    2           5
    3           11
    

    Notes: I used the simple integers but the DateDiffs should work too

    Correctness issue There is a correctness issue if your data is allowed to have multiple overlaps as Cheran S noted, the results won't be correct and you should use one of the other answers instead. His example used [1,5],[4,8],[7,11] for the same person ID

    0 讨论(0)
  • 2021-01-03 02:26

    You can use a recursive CTE to build a list of dates and then count the distinct dates.

    declare @T table
    (
      startDate date,
      endDate date
    );
    
    insert into @T values
    ('2011-01-01', '2011-01-05'),
    ('2011-01-04', '2011-01-08'),
    ('2011-01-11', '2011-01-15');
    
    with C as
    (
      select startDate,
             endDate
      from @T
      union all
      select dateadd(day, 1, startDate),
             endDate
      from C
      where dateadd(day, 1, startDate) < endDate       
    )
    select count(distinct startDate) as DayCount
    from C
    option (MAXRECURSION 0)
    

    Result:

    DayCount
    -----------
    11
    

    Or you can use a numbers table. Here I use master..spt_values:

    declare @MinStartDate date
    select @MinStartDate = min(startDate)
    from @T
    
    select count(distinct N.number)
    from @T as T
      inner join master..spt_values as N
        on dateadd(day, N.Number, @MinStartDate) between T.startDate and dateadd(day, -1, T.endDate)
    where N.type = 'P'    
    
    0 讨论(0)
  • 2021-01-03 02:36

    Try something like this

    select 
        personId, 
        sum(DateDuration) as TotalDuration
    from
    (
        select personId, datediff(dd, startDate, endDate) as DateDuration
        from yourEventTable
    ) a
    group by personId
    
    0 讨论(0)
  • 2021-01-03 02:37
    ;WITH cte(gap)
    AS
    (
        SELECT sum(b-a) from xxx GROUP BY uid
    )
    
    SELECT * FROM cte
    
    0 讨论(0)
提交回复
热议问题