问题
I'm trying to create a table which takes dates of when a employee is sick and create a new column to provide a "sickness ID", which will identify a unique instance of absence over several dates. I've managed to do this, however I now need to factor in a table which contains the working pattern of each employee, which will let me know if someone was due in work on a given day of the week.
This can be joined using the day_no
column in both tables along with the employee_number
.
I posted a this question earlier and had a great solution by @GMB, however I need this addition of the working hours.
I have table called sickness
which looks like this
date_sick day_no day_name employee_number hours_lost working_hours
2020-07-14 2 Tuesday 001 7.5 7.5
2020-07-15 3 Wednesday 001 7.5 7.5
2020-07-16 4 Thursday 001 7.5 7.5
2020-07-17 5 Friday 001 7.5 7.5
2020-07-21 2 Tuesday 001 7.5 7.5
2020-07-22 3 Wednesday 001 7.5 7.5
2020-07-23 4 Thursday 001 7.5 7.5
2020-07-24 5 Friday 001 7.5 7.5
2020-07-28 2 Tuesday 001 7.5 7.5
2020-07-29 3 Wednesday 001 7.5 7.5
2020-07-30 4 Thursday 001 7.5 7.5
2020-07-31 5 Friday 001 7.5 7.5
2020-09-09 3 Wednesday 001 7.5 7.5
2020-09-10 4 Thursday 001 7.5 7.5
2020-07-22 3 Wednesday 002 8 8
2020-07-23 4 Thursday 002 8 8
And my working hours table looks like this:
employee_number day_no working_hours
001 1 0
001 2 7.5
001 3 7.5
001 4 7.5
001 5 7.5
001 6 0
001 7 0
002 1 8
002 2 8
002 3 8
002 4 8
002 5 8
002 6 0
002 7 0
Using the following statement, I'm able to apply a unique sickness ID which identifies a unique instance of employee absence over consecutive dates, which is unique to both the employee and the dates there were absence, given by:
IF OBJECT_ID('dbo.sickness ', 'u') IS NOT NULL DROP TABLE dbo.sickness
CREATE TABLE dbo.sickness (date_sick date
, day_no int
, day_name varchar(10)
, employee_number char(5)
, hours_lost float
, working_hours float)
INSERT INTO dbo.sickness (date_sick, day_no, day_name, Employee_Number, Hours_Lost, Working_Hours)
VALUES
('2020-07-14', '2', 'Tuesday', '001', '7.5', '7.5'),
('2020-07-15', '3', 'Wednesday', '001', '7.5', '7.5'),
('2020-07-16', '4', 'Thursday', '001', '7.5', '7.5'),
('2020-07-17', '5', 'Friday', '001', '7.5', '7.5'),
('2020-07-21', '2', 'Tuesday', '001', '7.5', '7.5'),
('2020-07-22', '3', 'Wednesday', '001', '7.5', '7.5'),
('2020-07-23', '4', 'Thursday', '001', '7.5', '7.5'),
('2020-07-24', '5', 'Friday', '001', '7.5', '7.5'),
('2020-07-28', '2', 'Tuesday', '001', '7.5', '7.5'),
('2020-07-29', '3', 'Wednesday', '001', '7.5', '7.5'),
('2020-07-30', '4', 'Thursday', '001', '7.5', '7.5'),
('2020-07-31', '5', 'Friday', '001', '7.5', '7.5'),
('2020-09-09', '3', 'Wednesday', '001', '7.5', '7.5'),
('2020-09-10', '4', 'Thursday', '001', '7.5', '7.5'),
('2020-07-22', '3', 'Wednesday', '002', '8', '8'),
('2020-07-23', '4', 'Thursday', '002', '8', '8')
GO
IF OBJECT_ID('dbo.working_hours ', 'u') IS NOT NULL DROP TABLE dbo.working_hours
CREATE TABLE dbo.working_hours (employee_number char(5)
, day_no int
, working_hours float)
INSERT INTO dbo.working_hours (employee_number, day_no, working_hours)
VALUES
('001', '1', '0'),
('001', '2', '7.5'),
('001', '3', '7.5'),
('001', '4', '7.5'),
('001', '5', '7.5'),
('001', '6', '0'),
('001', '7', '0'),
('002', '1', '8'),
('002', '2', '8'),
('002', '3', '8'),
('002', '4', '8'),
('002', '5', '8'),
('002', '6', '0'),
('002', '7', '0');
WITH CTE AS(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY employee_number ORDER BY date_sick) AS rn
FROM dbo.sickness s)
SELECT c.date_sick,
c.day_no,
c.day_name,
c.employee_number,
c.hours_lost,
w.working_hours,
DENSE_RANK() OVER (ORDER BY C.employee_number, DATEADD(DAY, -C.rn, C.date_sick)) AS sickness_id
FROM CTE C
JOIN working_hours w
ON c.employee_number = w.employee_number
AND c.day_no = w.day_no
ORDER BY C.employee_number,
C.date_sick
DROP TABLE dbo.sickness
DROP TABLE dbo.working_hours
This outputs the following table:
date_sick day_no day_name employee_number hours_lost working_hours sickness_id
2020-07-14 2 Tuesday 001 7.5 7.5 1
2020-07-15 3 Wednesday 001 7.5 7.5 1
2020-07-16 4 Thursday 001 7.5 7.5 1
2020-07-17 5 Friday 001 7.5 7.5 1
2020-07-21 2 Tuesday 001 7.5 7.5 2
2020-07-22 3 Wednesday 001 7.5 7.5 2
2020-07-23 4 Thursday 001 7.5 7.5 2
2020-07-24 5 Friday 001 7.5 7.5 2
2020-07-28 2 Tuesday 001 7.5 7.5 3
2020-07-29 3 Wednesday 001 7.5 7.5 3
2020-07-30 4 Thursday 001 7.5 7.5 3
2020-07-31 5 Friday 001 7.5 7.5 3
2020-09-09 3 Wednesday 001 7.5 7.5 4
2020-09-10 4 Thursday 001 7.5 7.5 4
2020-07-22 3 Wednesday 002 8 8 5
2020-07-23 4 Thursday 002 8 8 5
The issue with this is that it's grouping the consecutive days but only ones that are within the same week. The first 12 rows should all have the same sickness ID. What I want is the following table:
date_sick day_no day_name employee_number hours_lost working_hours sickness_id
2020-07-14 2 Tuesday 001 7.5 7.5 1
2020-07-15 3 Wednesday 001 7.5 7.5 1
2020-07-16 4 Thursday 001 7.5 7.5 1
2020-07-17 5 Friday 001 7.5 7.5 1
2020-07-21 2 Tuesday 001 7.5 7.5 1
2020-07-22 3 Wednesday 001 7.5 7.5 1
2020-07-23 4 Thursday 001 7.5 7.5 1
2020-07-24 5 Friday 001 7.5 7.5 1
2020-07-28 2 Tuesday 001 7.5 7.5 1
2020-07-29 3 Wednesday 001 7.5 7.5 1
2020-07-30 4 Thursday 001 7.5 7.5 1
2020-07-31 5 Friday 001 7.5 7.5 1
2020-09-09 3 Wednesday 001 7.5 7.5 2
2020-09-10 4 Thursday 001 7.5 7.5 2
2020-07-22 3 Wednesday 002 8 8 3
2020-07-23 4 Thursday 002 8 8 3
Any ideas? Maybe connecting it to a calendar table?
回答1:
As I mention in the comment, just use a WHERE
. This is, of course, a blind guess due to a lack of sample data (the sample has no working hours data):
--I prefer CTEs over subqueries
WITH CTE AS(
SELECT s.date_sick,
s.employee_number,
ROW_NUMBER() OVER (PARTITION BY employee_number ORDER BY date_sick) AS rn
FROM dbo.sickness s)
SELECT C.date_sick,
C.employee_number,
DENSE_RANK() OVER (ORDER BY C.employee_number, DATEADD(DAY, -C.rn, C.date_sick)) AS sickness_id,
wh.workinghours
FROM CTE C
JOIN dbo.workinghours wh ON C.employee_number = wh.employee_number
WHERE wh.working_hours > 0
ORDER BY C.employee_number,
C.date_sick;
回答2:
I think that using lag()
to see if the sickness days are consecutive and then a cumulative sum is a better approach for assigning the sickness id.
I am a little unclear on what you want exactly. But here is one approach:
select date_sick, employee_number,
sum(case when working_hours > 0 and prev_working_hours > 0 and
dateadd(day, -1, date_sick) = prev_date_sick
then 0 else 1
end) over (partition by employee_number order by date_sick) as sickness_id
from (select s.*,
lag(date_sick) over (partition by employee_number order by date_sick) as prev_date_sick,
lag(working_hours) over (partition by employee_number order by date_sick) as prev_working_hours
from sickness s left join
working_hours wh
on s.date_sick = wh.working_hours
) s
order by employee_number, date_sick
来源:https://stackoverflow.com/questions/65126701/tracking-continuous-days-of-absence-from-work-days-only-sql