I have the following data set.
create table t1 (
dept number,
date1 date
);
Table created.
insert into t1 values (100, '01-jan-2013');
insert into t1 values (100, '02-jan-2013');
insert into t1 values (200, '03-jan-2013');
insert into t1 values (100, '04-jan-2013');
commit;
MY goal is to create a rank column that resets each time the department changes. The closest column that I can use for "partition by" clause is dept, but that won't give me the desired result.
SQL> select * from t1;
DEPT DATE1
---------- ---------
100 01-JAN-13
100 02-JAN-13
200 03-JAN-13
100 04-JAN-13
select dept,
date1,
rank () Over (partition by dept order by date1) rnk
from t1
order by date1;
DEPT DATE1 RNK
---------- --------- ----------
100 01-JAN-13 1
100 02-JAN-13 2
200 03-JAN-13 1
100 04-JAN-13 3
The desired output is as follows. The last rnk=1 is becuase the Jan-04 record is the first record after the change.
DEPT DATE1 RNK
---------- --------- ----------
100 01-JAN-13 1
100 02-JAN-13 2
200 03-JAN-13 1
100 04-JAN-13 1 <<<----------
Any pointers?
This is a little complicated. Instead of using rank()
or the like, use lag()
to see when something changes. Then do a cumulative sum of the flag.
select dept, date1,
CASE WHEN StartFlag = 0 THEN 1
ELSE 1+StartFlag+NVL(lag(StartFlag) over (order by date1),0)
END as rnk
from (select t1.*,
(case when dept = lag(dept) over (order by date1)
then 1
else 0
end) as StartFlag
from t1
) t1
order by date1;
Here is the SQLFiddle.
EDIT:
This is Gordon editing my own answer. Oops. The original query was 90% of the way there. It identified the groups where the numbers should increase, but did not assign the numbers within the groups. I would do this with another level of row_number()
as in:
select dept, date1,
row_number() over (partition by dept, grp order by date1) as rnk
from (select dept, date1, startflag,
sum(StartFlag) over (partition by dept order by date1) as grp
from (select t1.*,
(case when dept = lag(dept) over (order by date1)
then 0
else 1
end) as StartFlag
from t1
) t1
) t1
order by date1;
So, the overall idea is the following. First use lag()
to determine where a group begins (that is, where there is a department change from one date to the next). Then, assign a "group id" to these, by doing a cumulative sum. These are the records that are to be enumerated. The final step is to enumerate them using row_number()
.
This could have been a case for model
clause, but unfortunately it dramatically underperforms on significant amount of rows compared to Gordon's query.
select dept, date1, rank from t1
model
dimension by ( row_number() over(order by date1) as rn )
measures( 1 as rank, dept, date1 )
rules (
rank[1] = 1,
rank[rn > 1] =
case dept[cv()]
when dept[cv()-1] then rank[cv()-1] + 1
else 1
end
)
The approach is:
- Mark each row with a 'row number' and a 'changed' flag
- Calculate the final 'rnk' as the difference between the 'row number', and the maximum previous 'row number' that corresponds to a 'changed' row.
This is similar to Gordon's answer, but written using a CTE which I find easier to read.
with cte as (
select dept, date1,
row_number() over (order by date1) as row,
case when dept = (lag(dept) over (order by date1)) then 0 else 1 end as changed
from t1
)
select dept, date1,
row - max(case when changed = 1 then row else 1 end) over (order by date1) + 1 as rnk
from cte
order by date1
来源:https://stackoverflow.com/questions/17824432/oracle-analytic-functions-resetting-a-windowing-clause