问题
I have this data in a table
FIELD_A FIELD_B FIELD_D
249052903 10/15/2011 N
249052903 11/15/2011 P ------------- VALUE CHANGED
249052903 12/15/2011 P
249052903 1/15/2012 N ------------- VALUE CHANGED
249052903 2/15/2012 N
249052903 3/15/2012 N
249052903 4/15/2012 N
249052903 5/15/2012 N
249052903 6/15/2012 N
249052903 7/15/2012 N
249052903 8/15/2012 N
249052903 9/15/2012 N
When ever the value in FIELD_D changes it forms a group and I need the min and max dates in that group. The query shoud return
FIELD_A GROUP_START GROUP_END
249052903 10/15/2011 10/15/2011
249052903 11/15/2011 12/15/2011
249052903 1/15/2012 9/15/2012
The examples that I have seen so far have the data in Field_D being unique. Here the data can repeat as shown, First it is "N" then it changes to "P" and then back to "N".
Any help will be appreciated
Thanks
回答1:
You can use analytic functions - LAG, LEAD, and COUNT() OVER to your advantage, if they are supported by your SQL implementation. SQL Fiddle here.
WITH EndsMarked AS (
SELECT
FIELD_A,
FIELD_B,
CASE WHEN FIELD_D = LAG(FIELD_D,1) OVER (ORDER BY FIELD_B)
THEN 0 ELSE 1 END AS IS_START,
CASE WHEN FIELD_D = LEAD(FIELD_D,1) OVER (ORDER BY FIELD_B)
THEN 0 ELSE 1 END AS IS_END
FROM T
), GroupsNumbered AS (
SELECT
FIELD_A,
FIELD_B,
IS_START,
IS_END,
COUNT(CASE WHEN IS_START = 1 THEN 1 END)
OVER (ORDER BY FIELD_B) AS GroupNum
FROM EndsMarked
WHERE IS_START=1 OR IS_END=1
)
SELECT
FIELD_A,
MIN(FIELD_B) AS GROUP_START,
MAX(FIELD_B) AS GROUP_END
FROM GroupsNumbered
GROUP BY FIELD_A, GroupNum;
回答2:
This is fairly easy to express in SQL using subqueries:
select Field_A, Field_D, min(Field_B) as Group_Start, max(Field_B) as Group_End
from (select t.*,
(select min(field_B)
from t t2
where t2.field_A = t.field_A and
t2.field_B > t.field_B and
t2.Field_D <> t.field_D
) as TheGroup
from t
) t
group by Field_A, Field_D, TheGroup
This is assigning a group identifier using a correlated subquery. The identifier is the first value of Field_B
where Field_D
changes.
You don't mention the database you are using, so this uses standard SQL.
回答3:
Don't use SQL for this problem because it is not possible to do it in SQL with a single table scan since it requires comparison between records. It would need a full table scan plus at least a join with itself. It is trivial to implement a solution in a imperative language and it only requires a single table scan. Edit: a stored procedure would be best.
来源:https://stackoverflow.com/questions/15649530/find-start-and-end-dates-when-one-field-changes