问题
I am a school teacher very new to MS SQL server. Everyone is suggesting to try this site out. Here goes!
I am trying to write queries to test different types of outcome measures for the participation in a academic program. There are several different ways to calculate this outcome measurement I would like to try. The outcome which I am trying to Calculate is: What is the % of participants that are retained during six months of the program? I am testing different ways to define participant and different time ranges. There are 4 queries I am trying to produce. Unfortunately, I have to use for different tables: Attendance, Status, Deenrolled, Inactive. I have included sample data from each below
Queries
- A participant is defined as everyone that attended a class at least twice a week for 6 months (181 days total) starting at July 1st 2012 and Ending June 30th 2013, so the length of the fiscal year.If a participant is deenrolled or inactive they are dropped.
- A participant is defined as everyone that attended a class at least twice a week for 6 months (181 days total) starting at January 1st 2013. If a participant is deenrolled or becomes inactive they are dropped.
- A participant is defined as everyone that attended a class at least twice a week starting at January 1st 2013 until today
- A participant is defined as a student's enrollment start date until they are deenrolled or become inactive.
Participant (Numerator) participant / all students which were served (Denominator)
The 4 query outputs I am looking for are different versions of this:
Example
Participants Served Percent_Served
75 100 75%
I have been messing around with different versions of the query below
SELECT
Count (distinct ID) as Count,
Count ( DATEADD( dd, -181, DATEADD(wk, DATEDIFF(wk,0,Date), 0)) > 2 as Participants ,
FROM Attendance
where Attendence_date date between '07/01/2012' and '06/30/2013'
and ID not in (Select ID from Inactive)
or ID not in (select ID from Deenrolled)
GROUP BY ID
and
SELECT
Count (distinct ID) as Count,
Count ( DATEADD( dd, -181, DATEADD(wk, DATEDIFF(wk,0,Date), 0)) - Enrolled_Date as Participants ,
FROM Attendance
where Attendence_date date between '07/01/2012' and '06/30/2013'
and ID not in (Select ID from Inactive)
or ID not in (select ID from Deenrolled)
GROUP BY ID
Any programming assistance for these queries is greatly appreciated.
Below are the sample/example datasets.
Attendence_date is the date a student participated in one class.
CREATE TABLE Attendance (
ID int,
Attendence_date datetime
)
INSERT INTO Attendance VALUES
(4504498, '7/1/2012'),
(4504498, '7/2/2012'),
(4504498, '7/3/2012'),
(4504498, '7/4/2012'),
(4504498, '7/5/2012'),
(4504498, '7/8/2012'),
(4504498, '7/9/2012'),
(4504498, '7/10/2012'),
(4504498, '7/11/2012'),
(4504498, '7/12/2012'),
(4504498, '7/1/2012'),
(4504498, '7/2/2012'),
(4504498, '7/3/2012'),
(4504498, '7/4/2012'),
(4504498, '7/5/2012'),
(4504498, '7/8/2012'),
(4504498, '7/9/2012'),
(4504498, '7/10/2012'),
(4504498, '7/11/2012'),
(4504498, '7/12/2012'),
(9201052, '7/15/2012'),
(9201052, '7/16/2012'),
(9201052, '7/17/2012'),
(9201052, '7/17/2012'),
(9201052, '7/18/2012'),
(7949745, '7/17/2012'),
(7949745, '7/18/2012'),
(7949745, '7/23/2012'),
(7949745, '7/23/2012'),
(7949745, '7/24/2012'),
(7949745, '7/26/2012'),
(7949745, '7/26/2012'),
(7949745, '8/8/2012'),
(7949745, '8/8/2012'),
(7949745, '11/5/2012'),
(7949745, '11/5/2012'),
(7949745, '11/5/2012'),
(7949745, '11/6/2012'),
(7949745, '11/6/2012'),
(7949745, '11/6/2012'),
(7949745, '11/7/2012'),
(7949745, '11/7/2012'),
(7949745, '11/7/2012')
Here is the contains the enrollment date.
CREATE TABLE [Status] (
ID int,
Intake_Date datetime ,
Engaged_Date datetime ,
Enrolled_Date datetime)
INSERT INTO [Status] VALUES
(7949745, '3/7/2012', '7/17/2012', '3/8/2012'),
(4504498, '2/21/2013', '3/5/2013', '3/22/2013'),
(1486279, '4/18/2013', '5/7/2013', '5/20/2013'),
(9201052, '5/15/2012', '7/13/2012', '5/15/2012'),
(1722390, '3/5/2012', '8/27/2012', '3/8/2012'),
(7735695, '9/7/2012', '9/7/2012', '9/28/2012'),
(9261549, '3/7/2012', '7/24/2012', '3/8/2012'),
(3857008, '3/15/2013', '3/18/2013', '4/3/2013'),
(8502583, '3/14/2013', '4/15/2013', '5/3/2013'),
(1209774, '4/19/2012', '1/1/2012', '4/24/2012')
Here is the contains the de-enrollment date.
CREATE TABLE Deenrolled (
ID int,
Deenrolled_Date datetime)
INSERT INTO Deenrolled VALUES
(7949745, '2/4/2013'),
(5485272, '07/08/2013'),
(8955628, '01/10/2013'),
(5123221, '7/8/2013'),
(5774753, '7/18/2013'),
(3005451, '2/18/2013'),
(7518818, '05/29/2013'),
(9656985, '6/20/2013'),
(2438101, '7/17/2013'),
(1437052, '7/25/2013'),
(9133874, '4/25/2013'),
(7007375, '6/19/2013'),
(3178181, '5/24/2013')
And inactive
CREATE TABLE Inactive (
ID int,
Effect_Date datetime)
INSERT INTO Inactive VALUES
(1209774, '10/12/2012'),
(5419494, '10/12/2012'),
(4853049, '10/9/2012'),
(1453678, '5/23/2013'),
(1111554, '7/16/2012'),
(5564128, '2/15/2013'),
(1769234, '7/16/2012')
回答1:
Well I should say that is not an easy one. The main problem was to solve 'at least twice a week for a sixth months' part - it's easy to calculate twice a week, but it should be 6 continous months!
While I've tried to solve it, I've found absolutely brilliant answer by Niels van der Rest - Finding continuous ranges in a set of numbers. So I'll give you general query for the Part 1, you can change parameters and get result for Part 2:
declare @Weeks int, @PerWeek int, @StartDate date, @EndDate date, @count
select
@StartDate = '20120701',
@EndDate = '20130630',
@Weeks = 26, -- 6 month or 26 weeks
@PerWeek = 2 -- twice per week
select @count = count(distinct A.ID)
from Attendance as A
where
A.Attendence_date between @StartDate and @EndDate and
A.ID not in (select T.ID from Deenrolled as T) and
A.ID not in (select T.ID from Inactive as T)
;with CTE as (
-- Week numbers, filter by dates
select
A.ID,
datediff(dd, @StartDate, A.Attendence_date) / 7 as Wk
from Attendance as A
where
A.Attendence_date between @StartDate and @EndDate and
A.ID not in (select T.ID from Deenrolled as T) and
A.ID not in (select T.ID from Inactive as T)
), CTE2 as (
-- Group by week, filter less then @PerWeek per week, calculate row number
select
Wk, ID,
row_number() over (partition by ID order by Wk) as Row_Num
from CTE
group by Wk, ID
having count(*) >= @PerWeek
)
-- Final query - group by difference between week and row_number
select 100 * cast(count(distinct ID) as float) / @count
from CTE2
group by ID, Wk - Row_Num
having count(*) >= @Weeks
I've created SQL FIDDLE EXAMPLE, you can test the query.
Part 3 is easy
declare @PerWeek int, @StartDate date
select
@StartDate = '20130101',
@PerWeek = 2 -- twice per week
select @count = count(distinct A.ID)
from Attendance as A
where
A.Attendence_date >= @StartDate and
A.ID not in (select T.ID from Deenrolled as T) and
A.ID not in (select T.ID from Inactive as T)
;with CTE as (
-- Week numbers, filter by dates
select
A.ID,
datediff(dd, @StartDate, A.Attendence_date) / 7 as Wk
from Attendance as A
where
A.Attendence_date >= @StartDate and
A.ID not in (select T.ID from Deenrolled as T) and
A.ID not in (select T.ID from Inactive as T)
), CTE2 as (
-- Group by week, filter less then @PerWeek per week
select distinct ID
from CTE
group by Wk, ID
having count(*) >= @PerWeek
)
select 100 * cast(count(*) as float) / @count from CTE2
Part 4 seems a bit unclear for me, could you clarify?
回答2:
Give these a shot (changed because I missed a huge part of the question)
SELECT B.ID FROM
(SELECT number
FROM master.dbo.spt_values
WHERE TYPE = 'P' AND number < datediff(week, '07/01/2012', '06/30/2013') ) AS W
JOIN
(SELECT A.ID, weeknum
FROM
(SELECT ID, datediff(week, '07/01/2012',Attendence_date) AS weeknum
FROM Attendance
WHERE Attendence_date BETWEEN '07/01/2012' AND '06/30/2013'
AND ID NOT IN (SELECT ID FROM Deenrolled)
AND ID NOT IN (SELECT ID FROM Inactive)) AS A
GROUP BY A.ID, A.weeknum
HAVING COUNT(A.ID) > 2) AS B ON W.number = B.weeknum
GROUP BY B.ID
HAVING COUNT(W.number) = datediff(week, '07/01/2012', '06/30/2013');
SELECT B.ID FROM
(SELECT number
FROM master.dbo.spt_values
WHERE TYPE = 'P' AND number < datediff(week, '01/01/2013', '06/30/2013') ) AS W
JOIN
(SELECT A.ID, weeknum
FROM
(SELECT ID, datediff(week, '01/01/2013',Attendence_date) AS weeknum
FROM Attendance
WHERE Attendence_date BETWEEN '01/01/2013' AND '06/30/2013'
AND ID NOT IN (SELECT ID FROM Deenrolled)
AND ID NOT IN (SELECT ID FROM Inactive)) AS A
GROUP BY A.ID, A.weeknum
HAVING COUNT(A.ID) > 2) AS B ON W.number = B.weeknum
GROUP BY B.ID
HAVING COUNT(W.number) = datediff(week, '07/01/2012', '06/30/2013');
SELECT B.ID FROM
(SELECT number
FROM master.dbo.spt_values
WHERE TYPE = 'P' AND number < datediff(week, '01/01/2013', '06/30/2013') ) AS W
JOIN
(SELECT A.ID, weeknum
FROM
(SELECT ID, datediff(week, '01/01/2013',GetDate()) AS weeknum
FROM Attendance
WHERE Attendence_date BETWEEN '01/01/2013' AND GetDate()
AND ID NOT IN (SELECT ID FROM Deenrolled)
AND ID NOT IN (SELECT ID FROM Inactive)) AS A
GROUP BY A.ID, A.weeknum
HAVING COUNT(A.ID) > 2) AS B ON W.number = B.weeknum
GROUP BY B.ID
HAVING COUNT(W.number) = datediff(week, '07/01/2012', GetDate());
SELECT DISTINCT(Attendance.ID)
FROM Attendance
WHERE Attendance.ID NOT IN (SELECT ID FROM Deenrolled)
AND ID NOT IN (SELECT ID FROM Inactive);
and an sqlfiddle to help you out: http://sqlfiddle.com/#!6/97230/3 Good luck!
来源:https://stackoverflow.com/questions/17933622/of-participants-that-are-retained-during-six-months