SQL to determine multiple date ranges (SQL Server 2000)

后端 未结 5 1099
小蘑菇
小蘑菇 2021-01-15 01:20

I have a table which contains an ID and a Date for an event. Each row is for one date. I am trying to determine consecutive date ranges and consoli

相关标签:
5条回答
  • 2021-01-15 01:31

    This SO Question might help you. I linked directly to Rob Farley's answer as I feel this is a similar problem.

    0 讨论(0)
  • 2021-01-15 01:32

    This is the way I've done it in the past. It's a two step process:

    1. Build the set of candidate contiguous periods
    2. If there are any overlapping periods, delete all but the longest such period.

    Here's a script that shows how it's done. You might be able to pull it off in a single [bug, ugly] query, but trying to do that makes my head hurt. I'm using temp tables as it makes the debugging a whole lot easier.

    drop table #source
    create table #source
    (
      id    int      not null ,
      dtCol datetime not null ,
    
      -----------------------------------------------------------------------
      -- ASSUMPTION 1: Each date must be unique for a given ID value.
      -----------------------------------------------------------------------
      unique clustered ( id , dtCol ) ,
    
      -----------------------------------------------------------------------
      -- ASSUMPTION 2: The datetime column only represents a day.
      -- The value of the time component is always 00:00:00.000
      -----------------------------------------------------------------------
      check ( dtCol = convert(datetime,convert(varchar,dtCol,112),112) ) ,
    
    )
    go
    
    insert #source values(1,'jan 1, 2011')
    insert #source values(1,'jan 4, 2011')
    insert #source values(1,'jan 5, 2011')
    insert #source values(2,'jan 1, 2011')
    insert #source values(2,'jan 2, 2011')
    insert #source values(2,'jan 3, 2011')
    insert #source values(2,'jan 5, 2011')
    insert #source values(3,'jan 1, 2011')
    insert #source values(4,'jan 1, 2011')
    insert #source values(4,'jan 2, 2011')
    insert #source values(4,'jan 3, 2011')
    insert #source values(4,'jan 4, 2011')
    go
    
    insert #source values( 200236 , '2011-01-02')
    insert #source values( 200236 , '2011-01-03')
    insert #source values( 200236 , '2011-01-05')
    insert #source values( 200236 , '2011-01-06')
    insert #source values( 200236 , '2011-01-07')
    insert #source values( 200236 , '2011-01-08')
    insert #source values( 200236 , '2011-01-09')
    insert #source values( 200236 , '2011-01-10')
    insert #source values( 200236 , '2011-01-11')
    insert #source values( 200236 , '2011-01-12')
    insert #source values( 200236 , '2011-01-13')
    insert #source values( 200236 , '2011-01-15')
    insert #source values( 200236 , '2011-01-16')
    insert #source values( 200236 , '2011-01-17')
    go
    
    drop table #candidate_range
    go
    create table #candidate_range
    (
      rowId   int      not null identity(1,1) ,
      id      int      not null ,
      dtFrom  datetime not null ,
      dtThru  datetime not null ,
      length  as 1+datediff(day,dtFrom,dtThru) ,
    
      primary key nonclustered ( rowID ) ,
      unique clustered (id,dtFrom,dtThru) ,
    
    )
    go
    
    --
    -- seed the candidate range table with the set of all possible contiguous ranges for each id
    --
    insert #candidate_range ( id , dtFrom , dtThru )
    select id      = tFrom.id    ,
           valFrom = tFrom.dtCol ,
           valThru = tThru.dtCol
    from #source tFrom
    join #source tThru on tThru.id     = tFrom.id
                      and tThru.dtCol >= tFrom.dtCol
    where 1+datediff(day,tFrom.dtCol,tThru.dtCol) = ( select count(*)
                                                      from #source t
                                                      where t.id = tFrom.id
                                                        and t.dtCol between tFrom.dtCol and tThru.dtCol
                                                    )
    order by 1,2,3
    go
    
    --
    -- compare the table to itself. If we find overlapping periods,
    -- we'll keep the longest such period and delete the shorter overlapping periods.
    --
    delete t2
    from #candidate_range t1
    join #candidate_range t2 on t2.id      = t1.id
                            and t2.rowId  != t1.rowID
                            and t2.length <  t1.length
                            and t2.dtFrom <= t1.dtThru
                            and t2.dtThru >= t1.dtFrom
    go
    

    That's about all there is to it.

    0 讨论(0)
  • 2021-01-15 01:42

    One approach you can take is to add a field that indicates the next date in the sequence. (Either add it to your current table or use a temporary table, store the underlying data to the temp table and then update the next date in the sequence).

    Your starting data structure would look something like this:

    ID, PerfDate, NextDate
    200236, 2011-01-02, 2011-01-03
    200236, 2011-01-03, 2011-01-04
    etc.
    

    You can then use a series of correlated subqueries to roll the data up into the desired output:

    SELECT ID, StartDate, EndDate
    FROM (
    SELECT DISTINCT ID, PerfDate AS StartDate, 
        (SELECT MIN([PerfDate]) FROM [SourceTable] S3
        WHERE S3.ID = S1.ID
        AND S3.NextDate > S1.PerfDate
        AND ISNULL(
            (SELECT MIN(PerfDate) 
            FROM [SourceTable] AS S4
            WHERE S4.ID = S1.ID 
            AND S4.NextDate > S3.NextDate), S3.NextDate + 1) > S3.NextDate) AS EndDate
    FROM [SourceTable] S1
    WHERE 
        ISNULL(
            (SELECT MAX(NextDate) 
            FROM [SourceTable] S2 
            WHERE S2.ID = S1.ID 
            AND S2.PerfDate < S1.PerfDate), PerfDate -1) < S1.PerfDate)q
    ORDER BY q.ID, q.StartDate
    
    0 讨论(0)
  • 2021-01-15 01:48

    I've just done this similar thing in SQL Server 2008. I think the following translation will work for SQL Server 2000:

    -- Create table variable
    DECLARE @StartTable TABLE
    (
      rowid INT IDENTITY(1,1) NOT NULL,
      userid int,
      startDate date
    )
    
    Insert Into @StartTable(userid, startDate)
    --This finds the start dates by finding unmatched values
    SELECT t1.ID, t1.[Date]
    FROM Example As t1
    LEFT OUTER JOIN Example As t2 ON t1.ID=t2.ID 
       And DateAdd(day, 1, t2.[Date]) = t1.[Date]
    WHERE t2.[Date] Is NULL
    ORDER BY t1.ID, t1.[Date]
    
    -- Create table variable
    DECLARE @EndTable TABLE
    (
      rowid INT IDENTITY(1,1) NOT NULL,
      userid int,
      endDate date
    )
    
    Insert Into @EndTable(userid, endDate)
    --This finds the end dates by getting unmatched values 
    SELECT t1.ID, t1.[Date]
    FROM Example As t1
    LEFT OUTER JOIN Example As t2 ON t1.ID=t2.ID
       And DateAdd(day, -1, t2.[Date]) = t1.[Date]
    WHERE t2.[Date] IS NULL
    ORDER BY t1.ID, t1.[Date]
    
    Select eT.userid, startDate, endDate 
    From @EndTable eT
    INNER JOIN @StartTable sT On eT.userid = sT.userid 
    AND eT.rowid = sT.rowid;
    

    So as you can see, I created two table variables, one for starts and one for ends, by self-joining the table on the date either just prior to or just after the date in the [Date] column. This means that I'm selecting only records that don't have a date prior (so these would be at the beginning of a period) for the Start Table and those that have no date following (so these would be at the end of a period) for the End Table.

    When these are inserted into the table variable, they are numbered in sequence because of the Identity column. Then I join the two table variables together. Because they are ordered, the start and end dates should always match up properly.

    This solution works for me because I have at most one record per ID per day and I am only interested in days, not hours, etc. Even though it is several steps, I like it because it is conceptually simple and eliminates matched records without having cursors or loops. I hope it will work for you too.

    0 讨论(0)
  • 2021-01-15 01:55
    SELECT ...
    FROM   ...
    WHERE  date_column BETWEEN '2011-01-02' AND '2011-01-15'
    

    perhaps? Reference

    Or you can do a sub-query and link the next record using a MAX where date is <= current date:

    SELECT id, date, (SELECT MAX(date) FROM mytable WHERE date <= mytable.date) AS nextDate
    FROM   mytable
    

    Or use:

    SELECT TOP 1 date
    FROM         mytable
    WHERE        date <= mytable.date AND id <> mytable.id
    ORDER BY     date
    

    As the sub-query so it grabs the next date in line after the current record.

    0 讨论(0)
提交回复
热议问题