How to select overlapping date ranges in SQL

前端 未结 2 471
被撕碎了的回忆
被撕碎了的回忆 2021-01-22 10:44

I have a table with the following columns : sID, start_date and end_date

Some of the values are as follows:

1   1995-07-28  2003-07-20 
1   2003-07-21  2         


        
相关标签:
2条回答
  • 2021-01-22 10:57

    Your logic is not totally correct, although it almost works on your sample data. The specific reason it fails is because between includes the end points, so any given row matches itself. That said, the logic still isn't correct because it doesn't catch this situation:

     a-------------a
          b----b
    

    Here is correct logic:

    select a.*
    from table a
    where exists (select 1
                  from table b
                  where a.sid = b.sid and
                        a.start_date < b.end_date and
                        a.end_date > b.start_date and
                        (a.start_date <> b.start_date or  -- filter out the record itself
                         a.end_date <> b.end_date
                        )
                 )
    order by a.end_date;
    

    The rule for overlapping time periods (or ranges of any sort) is that period 1 overlaps with period 2 when period 1 starts before period 2 ends and period 1 ends after period 2 starts. Happily, there is no need or use for between for this purpose. (I strongly discourage using between with date/time operands.)

    I should note that this version does not consider two time periods to overlap when one ends on the same day another begins. That is easily adjusted by changing the < and > to <= and >=.

    Here is a SQL Fiddle.

    0 讨论(0)
  • 2021-01-22 11:01

    One way of doing this reasonably efficiently is

    WITH T1
         AS (SELECT *,
                    MAX(end_date) OVER (PARTITION BY sID ORDER BY start_date) AS max_end_date_so_far
             FROM   YourTable),
         T2
         AS (SELECT *,
                    range_start = IIF(start_date <= LAG(max_end_date_so_far) OVER (PARTITION BY sID ORDER BY start_date), 0, 1),
                    next_range_start = IIF(LEAD(start_date) OVER (PARTITION BY sID ORDER BY start_date) <= max_end_date_so_far, 0, 1)
             FROM   T1)
    SELECT SId,
           start_date,
           end_date
    FROM   T2
    WHERE  0 IN ( range_start, next_range_start ); 
    

    if you have an index on (sID, start_date) INCLUDE (end_date) this can perform the work with a single ordered scan.

    0 讨论(0)
提交回复
热议问题