问题
I have a table called sickness
which is a record of when an employee is off work sick. It looks like this:
Date_Sick Employee_Number
---------- ----------------
2020-06-08 001
2020-06-10 001
2020-06-11 001
2020-06-12 001
2020-06-08 002
2020-06-09 002
What I'm trying to do is add a new column with a unique ID to identify a unique instance of absence. A unique instance of absence is one that runs in consecutive weekdays with no breaks. Hence my output table should look like this:
Date_Sick Employee_Number Sickness_ID
---------- ---------------- -----------
2020-06-08 001 1
2020-06-10 001 2
2020-06-11 001 2
2020-06-12 001 2
2020-06-08 002 3
2020-06-09 002 3
I've tried creating various partitions using LEAD/LAG
to check if the next date is only 1 day away however I'm failing to get it to work.
AMENDMENT this also needs to factor in only the days an individual would be working, which I can add to the table. So for any date I can add a flag to say 'Y' or 'N' to state if the employee would be expected to be in the office. So weekends would typically be a 'N'.
Any ideas?
回答1:
This is a gaps-and-islands problem.
Here, I think the simplest approach is row_number()
and date arithmetics:
select date_sick, employee_number,
dense_rank() over(order by employee_number, dateadd(day, -rn, date_sick)) as sickness_id
from (
select s.*,
row_number() over(partition by employee_number order by date_sick) as rn
from sickness s
) s
order by employee_number, date_sick
This works by comparing date_sick
against an incrementing id, then using that information to rank the records.
Demo on DB Fiddle - with credits to Larnu for generating the DDL in the first place:
date_sick | employee_number | sickness_id :--------- | :-------------- | ----------: 2020-06-08 | 001 | 1 2020-06-10 | 001 | 2 2020-06-11 | 001 | 2 2020-06-12 | 001 | 2 2020-06-08 | 002 | 3 2020-06-09 | 002 | 3
来源:https://stackoverflow.com/questions/65124418/tracking-a-continuous-instance-of-absence-sql