问题
I am trying to build a query that analyzes data in our time tracking system. Every time a user swipes in or out, it makes a row recording the swipe time and On or Off site (entry or exit). In user 'Joe Bloggs' case there are 4 rows, which I want to pair and calculate a total time spent on site for Joe Bloggs.
The problem is that there are records that are not as easy to pair. In the example given, the second user has two consecutive 'on's, and I need to find a method for ignoring repeated 'on' or 'off' rows.
ID | Time |OnOffSite| UserName
------------------------------------------------------
123 | 2011-10-25 09:00:00.000 | on | Bloggs Joe |
124 | 2011-10-25 12:00:00.000 | off | Bloggs Joe |
125 | 2011-10-25 13:00:00.000 | on | Bloggs Joe |
126 | 2011-10-25 17:00:00.000 | off | Bloggs Joe |
127 | 2011-10-25 09:00:00.000 | on | Jonesy Ian |
128 | 2011-10-25 10:00:00.000 | on | Jonesy Ian |
129 | 2011-10-25 11:00:00.000 | off | Jonesy Ian |
130 | 2011-10-25 12:00:00.000 | on | Jonesy Ian |
131 | 2011-10-25 15:00:00.000 | off | Jonesy Ian |
My System is MS SQL 2005. The reporting period for the query is Monthly.
Can anyone suggest a solution? my data is already grouped in a table by Username and time, with the ID field being Identity.
回答1:
-- =====================
-- sample data
-- =====================
declare @t table
(
ID int,
Time datetime,
OnOffSite varchar(3),
UserName varchar(50)
)
insert into @t values(123, '2011-10-25 09:00:00.000', 'on', 'Bloggs Joe')
insert into @t values(124, '2011-10-25 12:00:00.000', 'off', 'Bloggs Joe')
insert into @t values(125, '2011-10-25 13:00:00.000', 'on', 'Bloggs Joe')
insert into @t values(126, '2011-10-25 17:00:00.000', 'off', 'Bloggs Joe')
insert into @t values(127, '2011-10-25 09:00:00.000', 'on', 'Jonesy Ian')
insert into @t values(128, '2011-10-25 10:00:00.000', 'on', 'Jonesy Ian')
insert into @t values(129, '2011-10-25 11:00:00.000', 'off', 'Jonesy Ian')
insert into @t values(130, '2011-10-25 12:00:00.000', 'on', 'Jonesy Ian')
insert into @t values(131, '2011-10-25 15:00:00.000', 'off', 'Jonesy Ian')
-- =====================
-- solution
-- =====================
select
UserName, timeon, timeoff, diffinhours = DATEDIFF(hh, timeon, timeoff)
from
(
select
UserName,
timeon = max(case when k = 2 and OnOffSite = 'on' then Time end),
timeoff = max(case when k = 1 and OnOffSite = 'off' then Time end)
from
(
select
ID,
UserName,
OnOffSite,
Time,
rn = ROW_NUMBER() over(partition by username order by id)
from
(
select
ID,
UserName,
OnOffSite,
Time,
rn2 = case OnOffSite
-- '(..order by id)' takes earliest 'on' in the sequence of 'on's
-- to take the latest use '(...order by id desc)'
when 'on' then
ROW_NUMBER() over(partition by UserName, OnOffSite, rn1 order by id)
-- '(... order by id desc)' takes the latest 'off' in the sequence of 'off's
-- to take the earliest use '(...order by id)'
when 'off' then
ROW_NUMBER() over(partition by UserName, OnOffSite, rn1 order by id desc)
end,
rn1
from
(
select
*,
rn1 = ROW_NUMBER() over(partition by username order by id) +
ROW_NUMBER() over(partition by username, onoffsite order by id desc)
from @t
) t
) t
where rn2 = 1
) t1
cross join
(
select k = 1 union select k = 2
) t2
group by UserName, rn + k
) t
where timeon is not null or timeoff is not null
order by username
回答2:
First you need to talk with the business side and decide on a set of matching rules.
After that I suggest that you add a status field to the table where you record the status of each row (matched, unmatched, deleted etc). Whenever a row is added you should try to match it to make a pair. A successful match sets the status of both rows to matched, otherwise the new row will be unmatched.
来源:https://stackoverflow.com/questions/7899408/sql-server-find-datediff-between-different-rows-sum