Find conflicted date intervals using SQL

前端 未结 5 1282
梦谈多话
梦谈多话 2021-01-05 18:28

Suppose I have following table in Sql Server 2008:

ItemId StartDate   EndDate
1      NULL        2011-01-15
2      2011-01-16  2011-01-25
3      2011-01-26           


        
相关标签:
5条回答
  • 2021-01-05 18:32

    This will give you the rows that are incorrect.

    Added ROW_NUMBER() as I didnt know if all entries where in order.

    -- Testdata
    declare @date datetime = '2011-01-17'
    
    ;with yourTable(itemID, startDate, endDate)
    as
    (
        SELECT  1,  NULL, @date
        UNION ALL
        SELECT  2,  dateadd(day, -1, @date),    DATEADD(day, 10, @date)
        UNION ALL
        SELECT  3,  DATEADD(day, 60, @date),    NULL
    )
    
    -- End testdata
    
    ,tmp
    as
    (
        select  *
                ,ROW_NUMBER() OVER(order by startDate) as rowno 
        from    yourTable
    )
    
    select  *
    from    tmp t1
    left join   tmp t2
        on t1.rowno = t2.rowno - 1
    where   t1.endDate > t2.startDate
    

    EDIT: As for the updated question:

    Just add a PARTITION BY clause to the ROW_NUMBER() query and alter the join.

    -- Testdata
    declare @date datetime = '2011-01-17'
    
    ;with yourTable(itemID, startDate, endDate, intervalID)
    as
    (
        SELECT  1,  NULL, @date, 1
        UNION ALL
        SELECT  2,  dateadd(day, 1, @date), DATEADD(day, 10, @date),1
        UNION ALL
        SELECT  3,  DATEADD(day, 60, @date),    NULL,   1
        UNION ALL
        SELECT  4,  NULL, @date, 2
        UNION ALL
        SELECT  5,  dateadd(day, -1, @date),    DATEADD(day, 10, @date),2
        UNION ALL
        SELECT  6,  DATEADD(day, 60, @date),    NULL,   2
    )
    
    -- End testdata
    
    ,tmp
    as
    (
        select  *
                ,ROW_NUMBER() OVER(partition by intervalID order by startDate) as rowno 
        from    yourTable
    )
    
    select  *
    from    tmp t1
    left join   tmp t2
        on t1.rowno = t2.rowno - 1
        and t1.intervalID = t2.intervalID
    where   t1.endDate > t2.startDate
    
    0 讨论(0)
  • 2021-01-05 18:35

    I've always taken a slightly different approach to the design if I have data that is never to have overlapping intervals... namely don't store intervals, but only start times. Then, have a view that helps with displaying the intervals.

    CREATE TABLE intervalStarts
    (
      ItemId      int,
      IntervalId  int,
      StartDate   datetime
    )
    
    CREATE VIEW intervals
    AS
    with cte as (
      select ItemId, IntervalId, StartDate,
         row_number() over(partition by IntervalId order by isnull(StartDate,'1753-01-01')) row
      from intervalStarts
    )
    select c1.ItemId, c1.IntervalId, c1.StartDate,
      dateadd(dd,-1,c2.StartDate) as 'EndDate'
    from cte c1
      left join cte c2 on c1.IntervalId=c2.IntervalId
                        and c1.row=c2.row-1
    

    So, sample data might look like:

    INSERT INTO intervalStarts
    select 1, 1, null union
    select 2, 1, '2011-01-16' union
    select 3, 1, '2011-01-26' union
    select 4, 2, null union
    select 5, 2, '2011-01-26' union
    select 6, 2, '2011-01-14'
    

    and a simple SELECT * FROM intervals yields:

    ItemId | IntervalId | StartDate  | EndDate
    1      | 1          | null       | 2011-01-15
    2      | 1          | 2011-01-16 | 2011-01-25
    3      | 1          | 2011-01-26 | null
    4      | 2          | null       | 2011-01-13
    6      | 2          | 2011-01-14 | 2011-01-25
    5      | 2          | 2011-01-26 | null
    
    0 讨论(0)
  • 2021-01-05 18:39

    Not directly related to the OP, but since Adrian's expressed an interest. Here's a table than SQL Server maintains the integrity of, ensuring that only one valid value is present at any time. In this case, I'm dealing with a current/history table, but the example can be modified to work with future data also (although in that case, you can't have the indexed view, and you need to write the merge's directly, rather than maintaining through triggers).

    In this particular case, I'm dealing with a link table that I want to track the history of. First, the tables that we're linking:

    create table dbo.Clients (
        ClientID int IDENTITY(1,1) not null,
        Name varchar(50) not null,
        /* Other columns */
        constraint PK_Clients PRIMARY KEY (ClientID)
    )
    go
    create table dbo.DataItems (
        DataItemID int IDENTITY(1,1) not null,
        Name varchar(50) not null,
        /* Other columns */
        constraint PK_DataItems PRIMARY KEY (DataItemID),
        constraint UQ_DataItem_Names UNIQUE (Name)
    )
    go
    

    Now, if we were building a normal table, we'd have the following (Don't run this one):

    create table dbo.ClientAnswers (
        ClientID int not null,
        DataItemID int not null,
        IntValue int not null,
        Comment varchar(max) null,
        constraint PK_ClientAnswers PRIMARY KEY (ClientID,DataItemID),
        constraint FK_ClientAnswers_Clients FOREIGN KEY (ClientID) references dbo.Clients (ClientID),
        constraint FK_ClientAnswers_DataItems FOREIGN KEY (DataItemID) references dbo.DataItems (DataItemID)
    )
    

    But, we want a table that can represent a complete history. In particular, we want to design the structure such that overlapping time periods can never appear in the database. We always know which record was valid at any particular time:

    create table dbo.ClientAnswerHistories (
        ClientID int not null,
        DataItemID int not null,
        IntValue int null,
        Comment varchar(max) null,
    
        /* Temporal columns */
        Deleted bit not null,
        ValidFrom datetime2 null,
        ValidTo datetime2 null,
        constraint UQ_ClientAnswerHistories_ValidFrom UNIQUE (ClientID,DataItemID,ValidFrom),
        constraint UQ_ClientAnswerHistories_ValidTo UNIQUE (ClientID,DataItemID,ValidTo),
        constraint CK_ClientAnswerHistories_NoTimeTravel CHECK (ValidFrom < ValidTo),
        constraint FK_ClientAnswerHistories_Clients FOREIGN KEY (ClientID) references dbo.Clients (ClientID),
        constraint FK_ClientAnswerHistories_DataItems FOREIGN KEY (DataItemID) references dbo.DataItems (DataItemID),
        constraint FK_ClientAnswerHistories_Prev FOREIGN KEY (ClientID,DataItemID,ValidFrom)
            references dbo.ClientAnswerHistories (ClientID,DataItemID,ValidTo),
        constraint FK_ClientAnswerHistories_Next FOREIGN KEY (ClientID,DataItemID,ValidTo)
            references dbo.ClientAnswerHistories (ClientID,DataItemID,ValidFrom),
        constraint CK_ClientAnswerHistory_DeletionNull CHECK (
            Deleted = 0 or
            (
                IntValue is null and
                Comment is null
            )),
        constraint CK_ClientAnswerHistory_IntValueNotNull CHECK (Deleted=1 or IntValue is not null)
    )
    go
    

    That's a lot of constraints. The only way to maintain this table is through merge statements (see examples below, and try to reason about why yourself). We're now going to build a view that mimics that ClientAnswers table defined above:

    create view dbo.ClientAnswers
    with schemabinding
    as
        select
            ClientID,
            DataItemID,
            ISNULL(IntValue,0) as IntValue,
            Comment
        from
            dbo.ClientAnswerHistories
        where
            Deleted = 0 and
            ValidTo is null
    go
    create unique clustered index PK_ClientAnswers on dbo.ClientAnswers (ClientID,DataItemID)
    go
    

    And we have the PK constraint we originally wanted. We've also used ISNULL to reinstate the not null-ness of the IntValue column (even though the check constraints already guarantee this, SQL Server is unable to derive this information). If we're working with an ORM, we let it target ClientAnswers, and the history gets automatically built. Next, we can have a function that lets us look back in time:

    create function dbo.ClientAnswers_At (
        @At datetime2
    )
    returns table
    with schemabinding
    as
        return (
            select
                ClientID,
                DataItemID,
                ISNULL(IntValue,0) as IntValue,
                Comment
            from
                dbo.ClientAnswerHistories
            where
                Deleted = 0 and
                (ValidFrom is null or ValidFrom <= @At) and
                (ValidTo is null or ValidTo > @At)
        )
    go
    

    And finally, we need the triggers on ClientAnswers that build this history. We need to use merge statements, since we need to simultaneously insert new rows, and update the previous "valid" row to end date it with a new ValidTo value.

    create trigger T_ClientAnswers_I
    on dbo.ClientAnswers
    instead of insert
    as
        set nocount on
        ;with Dup as (
            select i.ClientID,i.DataItemID,i.IntValue,i.Comment,CASE WHEN cah.ClientID is not null THEN 1 ELSE 0 END as PrevDeleted,t.Dupl
            from
                inserted i
                    left join
                dbo.ClientAnswerHistories cah
                    on
                        i.ClientID = cah.ClientID and
                        i.DataItemID = cah.DataItemID and
                        cah.ValidTo is null and
                        cah.Deleted = 1
                    cross join
                (select 0 union all select 1) t(Dupl)
        )
        merge into dbo.ClientAnswerHistories cah
        using Dup on cah.ClientID = Dup.ClientID and cah.DataItemID = Dup.DataItemID and cah.ValidTo is null and Dup.Dupl = 0 and Dup.PrevDeleted = 1
        when matched then update set ValidTo = SYSDATETIME()
        when not matched and Dup.Dupl=1 then insert (ClientID,DataItemID,IntValue,Comment,Deleted,ValidFrom)
        values (Dup.ClientID,Dup.DataItemID,Dup.IntValue,Dup.Comment,0,CASE WHEN Dup.PrevDeleted=1 THEN SYSDATETIME() END);
    go
    create trigger T_ClientAnswers_U
    on dbo.ClientAnswers
    instead of update
    as
        set nocount on
        ;with Dup as (
            select i.ClientID,i.DataItemID,i.IntValue,i.Comment,t.Dupl
            from
                inserted i
                    cross join
                (select 0 union all select 1) t(Dupl)
        )
        merge into dbo.ClientAnswerHistories cah
        using Dup on cah.ClientID = Dup.ClientID and cah.DataItemID = Dup.DataItemID and cah.ValidTo is null and Dup.Dupl = 0
        when matched then update set ValidTo = SYSDATETIME()
        when not matched then insert (ClientID,DataItemID,IntValue,Comment,Deleted,ValidFrom)
        values (Dup.ClientID,Dup.DataItemID,Dup.IntValue,Dup.Comment,0,SYSDATETIME());
    go
    create trigger T_ClientAnswers_D
    on dbo.ClientAnswers
    instead of delete
    as
        set nocount on
        ;with Dup as (
            select d.ClientID,d.DataItemID,t.Dupl
            from
                deleted d
                    cross join
                (select 0 union all select 1) t(Dupl)
        )
        merge into dbo.ClientAnswerHistories cah
        using Dup on cah.ClientID = Dup.ClientID and cah.DataItemID = Dup.DataItemID and cah.ValidTo is null and Dup.Dupl = 0
        when matched then update set ValidTo = SYSDATETIME()
        when not matched then insert (ClientID,DataItemID,Deleted,ValidFrom)
        values (Dup.ClientID,Dup.DataItemID,1,SYSDATETIME());
    go
    

    Obviously, I could have built a simpler table (not a join table), but this is my standard go-to example (albeit it took me a while to reconstruct it - I forgot the set nocount on statements for a while). But the strength here is that, the base table, ClientAnswerHistories is incapable of storing overlapping time ranges for the same ClientID and DataItemID values.

    Things get more complex when you need to deal with temporal foreign keys.


    Of course, if you don't want any real gaps, then you can remove the Deleted column (and associated checks), make the not null columns really not null, modify the insert trigger to do a plain insert, and make the delete trigger raise an error instead.

    0 讨论(0)
  • 2021-01-05 18:52

    Something like this should give you all overlaping periods

    SELECT
    * 
    FROM
    mytable t1 
    JOIN mytable t2 ON t1.EndDate>t2.StartDate AND t1.StartDate < t2.StartDate 
    

    Edited for Adrians comment bellow

    0 讨论(0)
  • 2021-01-05 18:53
    declare @T table (ItemId int, IntervalID int, StartDate datetime,   EndDate datetime)
    
    insert into @T
    select 1, 1,  NULL,        '2011-01-15' union all
    select 2, 1, '2011-01-16', '2011-01-25' union all
    select 3, 1, '2011-01-26',  NULL        union all
    select 4, 2,  NULL,        '2011-01-17' union all
    select 5, 2, '2011-01-16', '2011-01-25' union all
    select 6, 2, '2011-01-26',  NULL
    
    select T1.*
    from @T as T1
      inner join @T as T2
        on coalesce(T1.StartDate, '1753-01-01') < coalesce(T2.EndDate, '9999-12-31') and
           coalesce(T1.EndDate, '9999-12-31') > coalesce(T2.StartDate, '1753-01-01') and
           T1.IntervalID = T2.IntervalID and
           T1.ItemId <> T2.ItemId
    

    Result:

    ItemId      IntervalID  StartDate               EndDate
    ----------- ----------- ----------------------- -----------------------
    5           2           2011-01-16 00:00:00.000 2011-01-25 00:00:00.000
    4           2           NULL                    2011-01-17 00:00:00.000
    
    0 讨论(0)
提交回复
热议问题