Merge overlapping date intervals

后端 未结 7 942
不知归路
不知归路 2020-11-27 16:16

Is there a better way of merging overlapping date intervals?
The solution I came up with is so simple that now I wonder if someone else has a better idea of how this cou

相关标签:
7条回答
  • 2020-11-27 16:44

    I was looking for the same solution and came across this post on Combine overlapping datetime to return single overlapping range record.

    There is another thread on Packing Date Intervals.

    I tested this with various date ranges, including the ones listed here, and it works correctly every time.


    SELECT 
           s1.StartDate,
           --t1.EndDate 
           MIN(t1.EndDate) AS EndDate
    FROM @T s1 
    INNER JOIN @T t1 ON s1.StartDate <= t1.EndDate
      AND NOT EXISTS(SELECT * FROM @T t2 
                     WHERE t1.EndDate >= t2.StartDate AND t1.EndDate < t2.EndDate) 
    WHERE NOT EXISTS(SELECT * FROM @T s2 
                     WHERE s1.StartDate > s2.StartDate AND s1.StartDate <= s2.EndDate) 
    GROUP BY s1.StartDate 
    ORDER BY s1.StartDate 
    

    The result is:

    StartDate  | EndDate
    2010-01-01 | 2010-06-13
    2010-06-15 | 2010-06-25
    2010-06-26 | 2010-08-16
    2010-11-01 | 2010-12-31
    
    0 讨论(0)
  • 2020-11-27 16:45

    Here is a solution with just three simple scans. No CTEs, no recursion, no joins, no table updates in a loop, no "group by" — as a result, this solution should scale the best (I think). I think number of scans can be reduced to two, if min and max dates are known in advance; the logic itself just needs two scans — find gaps, applied twice.

    declare @datefrom datetime, @datethru datetime
    
    DECLARE @T TABLE (d1 DATETIME, d2 DATETIME)
    
    INSERT INTO @T (d1, d2)
    
    SELECT '2010-01-01','2010-03-31' 
    UNION SELECT '2010-03-01','2010-06-13' 
    UNION SELECT '2010-04-01','2010-05-31' 
    UNION SELECT '2010-06-15','2010-06-25' 
    UNION SELECT '2010-06-26','2010-07-10' 
    UNION SELECT '2010-08-01','2010-08-05' 
    UNION SELECT '2010-08-01','2010-08-09' 
    UNION SELECT '2010-08-02','2010-08-07' 
    UNION SELECT '2010-08-08','2010-08-08' 
    UNION SELECT '2010-08-09','2010-08-12' 
    UNION SELECT '2010-07-04','2010-08-16' 
    UNION SELECT '2010-11-01','2010-12-31' 
    
    select @datefrom = min(d1) - 1, @datethru = max(d2) + 1 from @t
    
    SELECT 
    StartDate, EndDate
    FROM
    (
        SELECT 
        MAX(EndDate) OVER (ORDER BY StartDate) + 1 StartDate,
        LEAD(StartDate ) OVER (ORDER BY StartDate) - 1 EndDate
        FROM
        (
            SELECT 
            StartDate, EndDate
            FROM
            (
                SELECT 
                MAX(EndDate) OVER (ORDER BY StartDate) + 1 StartDate,
                LEAD(StartDate) OVER (ORDER BY StartDate) - 1 EndDate 
                FROM 
                (
                    SELECT d1 StartDate, d2 EndDate from @T 
                    UNION ALL 
                    SELECT @datefrom StartDate, @datefrom EndDate 
                    UNION ALL 
                    SELECT @datethru StartDate, @datethru EndDate
                ) T
            ) T
            WHERE StartDate <= EndDate
            UNION ALL 
            SELECT @datefrom StartDate, @datefrom EndDate 
            UNION ALL 
            SELECT @datethru StartDate, @datethru EndDate
        ) T
    ) T
    WHERE StartDate <= EndDate
    

    The result is:

    StartDate   EndDate
    2010-01-01  2010-06-13
    2010-06-15  2010-08-16
    2010-11-01  2010-12-31
    
    0 讨论(0)
  • 2020-11-27 16:52

    The idea is to simulate the scanning algorithm for merging intervals. My solution makes sure it works across a wide range of SQL implementations. I've tested it on MySQL, Postgres, SQL-Server 2017, SQLite and even Hive.

    Assuming the table schema is the following.

    CREATE TABLE t (
      a DATETIME,
      b DATETIME
    );
    

    We also assume the interval is half-open like [a,b).

    When (a,i,j) is in the table, it shows that there are j intervals covering a, and there are i intervals covering the previous point.

    CREATE VIEW r AS 
    SELECT a,
           Sum(d) OVER (ORDER BY a ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS i,
           Sum(d) OVER (ORDER BY a ROWS UNBOUNDED PRECEDING) AS j
    FROM  (SELECT a, Sum(d) AS d
           FROM   (SELECT a,  1 AS d FROM t
                   UNION ALL
                   SELECT b, -1 AS d FROM t) e
           GROUP  BY a) f;
    

    We produce all the endpoints in the union of the intervals and pair up adjacent ones. Finally, we produce the set of intervals by only picking the odd-numbered rows.

    SELECT a, b
    FROM (SELECT a,
                 Lead(a)      OVER (ORDER BY a) AS b,
                 Row_number() OVER (ORDER BY a) AS n
          FROM   r
          WHERE  j=0 OR i=0 OR i is null) e
    WHERE  n%2 = 1;
    

    I've created a sample DB-fiddle and SQL-fiddle. I also wrote a blog post on union intervals in SQL.

    0 讨论(0)
  • 2020-11-27 17:00

    Try this

    ;WITH T1 AS
    (
        SELECT d1, d2, ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS R
        FROM @T
    ), NUMS AS
    (
        SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS R
        FROM T1 A
        CROSS JOIN T1 B
        CROSS JOIN T1 C
    ), ONERANGE AS 
    (
        SELECT DISTINCT DATEADD(DAY, ROW_NUMBER() OVER(PARTITION BY T1.R ORDER BY (SELECT 0)) - 1, T1.D1) AS ELEMENT
        FROM T1
        CROSS JOIN NUMS
        WHERE NUMS.R <= DATEDIFF(DAY, d1, d2) + 1
    ), SEQUENCE AS
    (
        SELECT ELEMENT, DATEDIFF(DAY, '19000101', ELEMENT) - ROW_NUMBER() OVER(ORDER BY ELEMENT) AS rownum
        FROM ONERANGE
    )
    SELECT MIN(ELEMENT) AS StartDate, MAX(ELEMENT) as EndDate
    FROM SEQUENCE
    GROUP BY rownum
    

    The basic idea is to first unroll the existing data, so you get a separate row for each day. This is done in ONERANGE

    Then, identify the relationship between how dates increment and the way the row numbers do. The difference remains constant within an existing range/island. As soon as you get to a new data island, the difference between them increases because the date increments by more than 1, while the row number increments by 1.

    0 讨论(0)
  • 2020-11-27 17:02

    You asked this back in 2010 but don't specify any particular version.

    An answer for people on SQL Server 2012+

    WITH T1
         AS (SELECT *,
                    MAX(d2) OVER (ORDER BY d1) AS max_d2_so_far
             FROM   @T),
         T2
         AS (SELECT *,
                    CASE
                      WHEN d1 <= DATEADD(DAY, 1, LAG(max_d2_so_far) OVER (ORDER BY d1))
                        THEN 0
                      ELSE 1
                    END AS range_start
             FROM   T1),
         T3
         AS (SELECT *,
                    SUM(range_start) OVER (ORDER BY d1) AS range_group
             FROM   T2)
    SELECT range_group,
           MIN(d1) AS d1,
           MAX(d2) AS d2
    FROM   T3
    GROUP  BY range_group 
    

    Which returns

    +-------------+------------+------------+
    | range_group |     d1     |     d2     |
    +-------------+------------+------------+
    |           1 | 2010-01-01 | 2010-06-13 |
    |           2 | 2010-06-15 | 2010-08-16 |
    |           3 | 2010-11-01 | 2010-12-31 |
    +-------------+------------+------------+
    

    DATEADD(DAY, 1 is used because your desired results show you want a period ending on 2010-06-25 to be collapsed into one starting 2010-06-26. For other use cases this may need adjusting.

    0 讨论(0)
  • 2020-11-27 17:02

    In this solution, I created a temporary Calendar table which stores a value for every day across a range. This type of table can be made static. In addition, I'm only storing 400 some odd dates starting with 2009-12-31. Obviously, if your dates span a larger range, you would need more values.

    In addition, this solution will only work with SQL Server 2005+ in that I'm using a CTE.

    With Calendar As
        (
        Select DateAdd(d, ROW_NUMBER() OVER ( ORDER BY s1.object_id ), '1900-01-01') As [Date]
        From sys.columns as s1
            Cross Join sys.columns as s2
        )
        , StopDates As
        (
        Select C.[Date]
        From Calendar As C
            Left Join @T As T
                On C.[Date] Between T.d1 And T.d2
        Where C.[Date] >= ( Select Min(T2.d1) From @T As T2 )
            And C.[Date] <= ( Select Max(T2.d2) From @T As T2 )
            And T.d1 Is Null
        )
        , StopDatesInUse As
        (
        Select D1.[Date]
        From StopDates As D1
            Left Join StopDates As D2
                On D1.[Date] = DateAdd(d,1,D2.Date)
        Where D2.[Date] Is Null
        )
        , DataWithEariestStopDate As 
        (
        Select *
        , (Select Min(SD2.[Date])
            From StopDatesInUse As SD2
            Where T.d2 < SD2.[Date] ) As StopDate
        From @T As T
        )
    Select Min(d1), Max(d2)
    From DataWithEariestStopDate
    Group By StopDate
    Order By Min(d1)
    

    EDIT The problem with using dates in 2009 has nothing to do with the final query. The problem is that the Calendar table is not big enough. I started the Calendar table at 2009-12-31. I have revised it start at 1900-01-01.

    0 讨论(0)
提交回复
热议问题