问题
I need help computing a date difference across different rows with variable lag (specifically, rows that are not on the same day) without subqueries, joins, etc. I think this should be possible with some inline t-SQL aggregates that use OVER(PARTITION BY)
clause, such as LAG
, DENSE_RANK
, etc., but I can't quite put a finger on it. This is for a SQL Server 2017 Developer's edition.
A clarifying example:
Consider a dataset with Job beginning and end dates (across various projects). Some jobs start and end on the same day (such as jobs 2 & 3, 4 & 5). I need to compute the idle time between consequent jobs that started on different days (per project). That is the days between last job's ending time and current job's beginning time. If the previous job started on the same day, then look further back in history of the same project. I.e. the jobs that started on the same day can be considered as parts of the same job.
UPDATE: I simplified the code/output by dropping time values (question's history has original dataset).
IF OBJECT_ID('tempdb..#t') IS NOT NULL DROP TABLE #t;
CREATE TABLE #t(Prj TINYINT, Beg DATE, Eñd DATE);
INSERT INTO #t SELECT 1, '1/1/17', '1/2/17';
INSERT INTO #t SELECT 1, '1/5/17', '1/7/17';
INSERT INTO #t SELECT 1, '1/5/17', '1/7/17';
INSERT INTO #t SELECT 1, '1/15/17', '1/15/17';
INSERT INTO #t SELECT 1, '1/15/17', '1/18/17';
INSERT INTO #t SELECT 1, '1/20/17', '1/24/17';
INSERT INTO #t SELECT 2, '2/2/17', '2/5/17';
INSERT INTO #t SELECT 2, '2/7/17', '2/9/17';
ALTER TABLE #t ADD Job INT NOT NULL IDENTITY (1,1) PRIMARY KEY;
A LAG(.,1)
function uses precisely the previous job's ending time, which is not what I want. It yields incorrect idle duration for jobs 2 & 3, 4 & 5. Jobs 2 & 3 should both use the ending time of job 1. Jobs 4 & 5 should both use the ending time of job 3. The joined query computes idle duration correctly, but an inline calculation is desirable here (without joins, subqueries).
SELECT c.Job, c.Prj, c.Beg, c.Eñd,
-- in-line computation with OVER clause
PrvEñd_lg=LAG(c.Eñd,1) OVER(PARTITION BY c.Prj ORDER BY c.Beg),
Idle_lg=DATEDIFF(DAY, LAG(c.Eñd,1) OVER(PARTITION BY c.Prj ORDER BY c.Beg), c.Beg),
-- calculation over current and (joined) previous records
PrvEñd_j=MAX(p.Eñd),
IdleDur_j=DATEDIFF(DAY, MAX(p.Eñd), c.Beg)
FROM #t c LEFT JOIN #t p ON c.Prj=p.Prj AND c.Beg > p.Eñd
GROUP BY c.Job, c.Prj, c.Beg, c.Eñd
ORDER BY c.Prj, c.Beg
Job Prj Beg Eñd PrvEñd_lg Idle_lg PrvEñd_j IdleDur_j
1 1 2017-01-01 2017-01-02 NULL NULL NULL NULL
2 1 2017-01-05 2017-01-07 2017-01-02 3 2017-01-02 3
3 1 2017-01-05 2017-01-07 2017-01-07 -2 2017-01-02 3
4 1 2017-01-15 2017-01-15 2017-01-07 8 2017-01-07 8
5 1 2017-01-15 2017-01-18 2017-01-15 0 2017-01-07 8
6 1 2017-01-20 2017-01-24 2017-01-18 2 2017-01-18 2
7 2 2017-02-02 2017-02-05 NULL NULL NULL NULL
8 2 2017-02-07 2017-02-09 2017-02-05 2 2017-02-05 2
Please let me know, if I can further clarify any specific details.
Many thanks!
回答1:
You can use a self-join.
select a.Job
, a.Prj
, a.Beg
, a.Eñd
, max(b.Eñd) as PrevEñd
, min(datediff(mi, b.Eñd, a.Beg) / (60*24.0)) as IdleDur
from #t as a
left join #t as b on a.Prj = b.Prj
and cast(a.Beg as date) > cast(b.Eñd as date)
group by a.Job
, a.Prj
, a.Beg
, a.Eñd
This produces the following output:
+-----+-----+---------------------+---------------------+---------------------+-----------+
| Job | Prj | Beg | Eñd | PrevEñd | IdleDur |
+-----+-----+---------------------+---------------------+---------------------+-----------+
| 1 | 1 | 2017-01-01 01:00:00 | 2017-01-02 02:00:00 | NULL | NULL |
| 2 | 1 | 2017-01-05 02:00:00 | 2017-01-07 03:00:00 | 2017-01-02 02:00:00 | 3.0000000 |
| 3 | 1 | 2017-01-05 03:00:00 | 2017-01-07 02:00:00 | 2017-01-02 02:00:00 | 3.0416666 |
| 4 | 1 | 2017-01-15 04:00:00 | 2017-01-15 03:00:00 | 2017-01-07 03:00:00 | 8.0416666 |
| 5 | 1 | 2017-01-15 15:00:00 | 2017-01-18 03:00:00 | 2017-01-07 03:00:00 | 8.5000000 |
| 6 | 1 | 2017-01-20 05:00:00 | 2017-01-24 02:00:00 | 2017-01-18 03:00:00 | 2.0833333 |
| 7 | 2 | 2017-02-02 06:00:00 | 2017-02-05 03:00:00 | NULL | NULL |
| 8 | 2 | 2017-02-07 07:00:00 | 2017-02-09 02:00:00 | 2017-02-05 03:00:00 | 2.1666666 |
+-----+-----+---------------------+---------------------+---------------------+-----------+
来源:https://stackoverflow.com/questions/47720899/compute-lag-difference-for-different-days