问题
I have the following dataset:
id | date | state
-----------------------
1 | 01/01/17 | high
1 | 02/01/17 | high
1 | 03/01/17 | high
1 | 04/01/17 | miss
1 | 05/01/17 | high
2 | 01/01/17 | miss
2 | 02/01/17 | high
2 | 03/01/17 | high
2 | 04/01/17 | miss
2 | 05/01/17 | miss
2 | 06/01/17 | high
I want to create a column rank_state
which ranks, within groups of id
, the entries as per increasing date
(starting from rank 0) which do not have the state
of "miss". Furthermore, the rank repeats itself if the entry has a state
of "miss". The output should look like:
id | date | state | rank_state
------------------------------------
1 | 01/01/17 | high | 0
1 | 02/01/17 | high | 1
1 | 03/01/17 | high | 2
1 | 04/01/17 | miss | 2
1 | 05/01/17 | high | 3
2 | 01/01/17 | miss | 0
2 | 02/01/17 | high | 0
2 | 03/01/17 | high | 1
2 | 04/01/17 | miss | 1
2 | 05/01/17 | miss | 1
2 | 06/01/17 | high | 2
For example, the 4th row has a rank of 2 since it's state
is "miss", i.e. it repeats the rank of row 3 (the same applies to rows 9 and 10). Please note that rows 6 and 7 should have rank 0.
I have tried the following:
,(case when state is not in ('miss') then (rank() over (partition by id order by date desc) - 1) end) as state_rank
and
,rank() over (partition by id order by case when state is not in ('miss') then date end) as state_rank
but neither give me the desired result. Any ideas would be very helpful.
回答1:
More than likely you want:
SELECT *,
GREATEST(
COUNT(case when state != 'miss' then 1 else null end)
OVER(PARTITION BY id ORDER BY date) - 1,
0
) as "state_rank"
FROM tbl;
SQL Fiddle
Basically:
- make your window frame (partition) over
id
- only count the ones that aren't
'miss'
- because it could be a negative number if starting the record, you can slap on the
GREATEST
to use 0 (preventing negatives)
回答2:
Just add frame_clause to vol7ron's answer since Redshift requires it :
select *
, GREATEST(COUNT(case when state != 'miss' then 1 else null end)
OVER(PARTITION BY id order by date rows between unbounded preceding and current row) -1 , 0 ) as state_rank
from tbl;
来源:https://stackoverflow.com/questions/43164608/rank-rows-in-a-column-under-conditions-on-a-different-column