问题
I have the following Source table where in there are records with start and end timestamps of a person logging in and logging out.
employeeNumber | start_time | end_time
john | 10/02/2020 16.30.000 | 11/02/2020 02.00.000
john | 10/02/2020 20.00.000 | 10/02/2020 22.00.000
john | 10/02/2020 23.00.000 | 11/02/2020 01.00.000
rick | 10/02/2020 10.00.000 | 10/02/2020 11.00.000
rick | 10/02/2020 13.00.000 | 10/02/2020 14.30.000
tom | 10/02/2020 09:00.000 | 10/02/2020 18.00.000
As you can see john has 3 overlapping records, rick has 2 non-overlapping record and tom has only 1 record.
Hence, i would want the result to look as following :
john | 10/02/2020 16.30.000 | 11/02/2020 02.00.000
rick | 10/02/2020 10.00.000 | 10/02/2020 11.00.000
rick | 10/02/2020 13.00.000 | 10/02/2020 14.30.000
tom | 10/02/2020 09:00.000 | 10/02/2020 18.00.000
So with some R&D and lot of help from @Gordon Linoff, the following sql was helpful in getting me close to my result.
with e as (
select t1.*,s.final_inc from
(
select e.employeeNumber, v.dt, sum(v.inc) as inc
from emp_data e cross apply
(values (start_time, 1),
(end_time, -1)
) v(dt, inc)
group by e.employeeNumber, v.dt) t1
outer apply
( select sum(t2.inc) as final_inc from
(select e.employeeNumber,v.dt,sum(v.inc) as inc
from emp_data e cross apply
(values (start_time, 1),
(end_time, -1)
) v(dt, inc)
group by e.employeeNumber, v.dt ) t2
where t2.employeeNumber = t1.employeeNumber and
t2.dt<=t1.dt)s
)
select employeeNumber, min(dt) as start_datetime, max(dt) as end_datetime
from (select e.*,
(select sum(case when e2.final_inc = 0 then 1 else 0 end)
from e e2
where e2.employeeNumber = e.employeeNumber and
e2.dt <= e.dt
) as grp
from e
) e
where final_inc <> 0
group by employeeNumber, grp;
Here is the DB fiddle having the query that i used to get the results up until now. In the fiddle, the second query is as suggested by @Gordon, However, since the compatibility level set for my SQL Server is 100, it does not support the use of order by along side sum() over. Hence i used outer apply for the same in my next query.
The above query now, gives me the following output:
john | 10/02/2020 16.30.000 | 11/02/2020 01.00.000
rick | 10/02/2020 10.00.000 | 10/02/2020 10.00.000
tom | 10/02/2020 09.00.000 | 10/02/2020 09.00.000
rick | 10/02/2020 13:00.000 | 10/02/2020 13.00.000
So, here i am faced with 2 issues.
- For the 2 rows against rick and 1 against tom , the result is giving only the start_time in both the start_time and end_time column.
- For John, Although it picked only one record with start time as
10/02/2020 16.30.000
, which is correct, but the end time that it picked up is11/02/2020 01.00.000
. However, the one that should be picked is11/02/2020 02.00.000
.
Any help is appreciated.
来源:https://stackoverflow.com/questions/60869021/getting-distinct-rows-for-overlapping-timestamp-sql-server