I have a table with the following columns : sID, start_date and end_date
Some of the values are as follows:
1 1995-07-28 2003-07-20
1 2003-07-21 2
Your logic is not totally correct, although it almost works on your sample data. The specific reason it fails is because between
includes the end points, so any given row matches itself. That said, the logic still isn't correct because it doesn't catch this situation:
a-------------a
b----b
Here is correct logic:
select a.*
from table a
where exists (select 1
from table b
where a.sid = b.sid and
a.start_date < b.end_date and
a.end_date > b.start_date and
(a.start_date <> b.start_date or -- filter out the record itself
a.end_date <> b.end_date
)
)
order by a.end_date;
The rule for overlapping time periods (or ranges of any sort) is that period 1 overlaps with period 2 when period 1 starts before period 2 ends and period 1 ends after period 2 starts. Happily, there is no need or use for between
for this purpose. (I strongly discourage using between
with date/time operands.)
I should note that this version does not consider two time periods to overlap when one ends on the same day another begins. That is easily adjusted by changing the <
and >
to <=
and >=
.
Here is a SQL Fiddle.
One way of doing this reasonably efficiently is
WITH T1
AS (SELECT *,
MAX(end_date) OVER (PARTITION BY sID ORDER BY start_date) AS max_end_date_so_far
FROM YourTable),
T2
AS (SELECT *,
range_start = IIF(start_date <= LAG(max_end_date_so_far) OVER (PARTITION BY sID ORDER BY start_date), 0, 1),
next_range_start = IIF(LEAD(start_date) OVER (PARTITION BY sID ORDER BY start_date) <= max_end_date_so_far, 0, 1)
FROM T1)
SELECT SId,
start_date,
end_date
FROM T2
WHERE 0 IN ( range_start, next_range_start );
if you have an index on (sID, start_date) INCLUDE (end_date)
this can perform the work with a single ordered scan.