Condense Time Periods with SQL

前端 未结 4 1572
春和景丽
春和景丽 2021-02-20 17:41

I have a large data set which for the purpose of this question has 3 fields:

  • Group Identifier
  • From Date
  • To Date

On any given row t

4条回答
  •  有刺的猬
    2021-02-20 18:24

    I'd use a Calendar table. This table simply has a list of dates for several decades.

    CREATE TABLE [dbo].[Calendar](
        [dt] [date] NOT NULL,
    CONSTRAINT [PK_Calendar] PRIMARY KEY CLUSTERED 
    (
        [dt] ASC
    ))
    

    There are many ways to populate such table.

    For example, 100K rows (~270 years) from 1900-01-01:

    INSERT INTO dbo.Calendar (dt)
    SELECT TOP (100000) 
        DATEADD(day, ROW_NUMBER() OVER (ORDER BY s1.[object_id])-1, '19000101') AS dt
    FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
    OPTION (MAXDOP 1);
    

    Once you have a Calendar table, here is how to use it.

    Each original row is joined with the Calendar table to return as many rows as there are dates between From and To.

    Then possible duplicates are removed.

    Then classic gaps-and-islands by numbering the rows in two sequences.

    Then grouping found islands together to get the new From and To.

    Sample data

    I added a second group.

    DECLARE @T TABLE (GroupID int, FromDate date, ToDate date);
    INSERT INTO @T (GroupID, FromDate, ToDate) VALUES
    (1, '2012-01-01', '2012-12-31'),
    (1, '2013-12-01', '2014-11-30'),
    (1, '2015-01-01', '2015-12-31'),
    (1, '2015-01-01', '2015-12-31'),
    (1, '2015-02-01', '2015-03-31'),
    (1, '2013-01-01', '2013-12-31'),
    (2, '2012-01-01', '2012-12-31'),
    (2, '2013-01-01', '2013-12-31');
    

    Query

    WITH
    CTE_AllDates
    AS
    (
        SELECT DISTINCT
            T.GroupID
            ,CA.dt
        FROM
            @T AS T
            CROSS APPLY
            (
                SELECT dbo.Calendar.dt
                FROM dbo.Calendar
                WHERE
                    dbo.Calendar.dt >= T.FromDate
                    AND dbo.Calendar.dt <= T.ToDate
            ) AS CA
    )
    ,CTE_Sequences
    AS
    (
        SELECT
            GroupID
            ,dt
            ,ROW_NUMBER() OVER(PARTITION BY GroupID ORDER BY dt) AS Seq1
            ,DATEDIFF(day, '2001-01-01', dt) AS Seq2
            ,DATEDIFF(day, '2001-01-01', dt) - 
                ROW_NUMBER() OVER(PARTITION BY GroupID ORDER BY dt) AS IslandNumber
        FROM CTE_AllDates
    )
    SELECT
        GroupID
        ,MIN(dt) AS NewFromDate
        ,MAX(dt) AS NewToDate
    FROM CTE_Sequences
    GROUP BY GroupID, IslandNumber
    ORDER BY GroupID, NewFromDate;
    

    Result

    +---------+-------------+------------+
    | GroupID | NewFromDate | NewToDate  |
    +---------+-------------+------------+
    |       1 | 2012-01-01  | 2014-11-30 |
    |       1 | 2015-01-01  | 2015-12-31 |
    |       2 | 2012-01-01  | 2013-12-31 |
    +---------+-------------+------------+
    

提交回复
热议问题