Rank rows in a column under conditions on a different column

拈花ヽ惹草 提交于 2020-01-17 13:09:15

问题


I have the following dataset:

 id |   date   | state
-----------------------
  1 | 01/01/17 | high
  1 | 02/01/17 | high
  1 | 03/01/17 | high
  1 | 04/01/17 | miss
  1 | 05/01/17 | high
  2 | 01/01/17 | miss
  2 | 02/01/17 | high
  2 | 03/01/17 | high
  2 | 04/01/17 | miss
  2 | 05/01/17 | miss
  2 | 06/01/17 | high

I want to create a column rank_state which ranks, within groups of id, the entries as per increasing date (starting from rank 0) which do not have the state of "miss". Furthermore, the rank repeats itself if the entry has a state of "miss". The output should look like:

 id |   date   | state | rank_state
------------------------------------
  1 | 01/01/17 | high  |    0
  1 | 02/01/17 | high  |    1
  1 | 03/01/17 | high  |    2
  1 | 04/01/17 | miss  |    2
  1 | 05/01/17 | high  |    3
  2 | 01/01/17 | miss  |    0
  2 | 02/01/17 | high  |    0
  2 | 03/01/17 | high  |    1
  2 | 04/01/17 | miss  |    1
  2 | 05/01/17 | miss  |    1
  2 | 06/01/17 | high  |    2

For example, the 4th row has a rank of 2 since it's state is "miss", i.e. it repeats the rank of row 3 (the same applies to rows 9 and 10). Please note that rows 6 and 7 should have rank 0.

I have tried the following: ,(case when state is not in ('miss') then (rank() over (partition by id order by date desc) - 1) end) as state_rank and ,rank() over (partition by id order by case when state is not in ('miss') then date end) as state_rank but neither give me the desired result. Any ideas would be very helpful.


回答1:


More than likely you want:

SELECT *, 
       GREATEST(
         COUNT(case when state != 'miss' then 1 else null end) 
           OVER(PARTITION BY id ORDER BY date) - 1,
         0
       ) as "state_rank"
FROM tbl;

SQL Fiddle

Basically:

  • make your window frame (partition) overid
  • only count the ones that aren't 'miss'
  • because it could be a negative number if starting the record, you can slap on the GREATEST to use 0 (preventing negatives)



回答2:


Just add frame_clause to vol7ron's answer since Redshift requires it :

select *
    , GREATEST(COUNT(case when state != 'miss' then 1 else null end) 
            OVER(PARTITION BY id order by date rows between unbounded preceding and current row) -1 , 0 ) as state_rank
from tbl;


来源:https://stackoverflow.com/questions/43164608/rank-rows-in-a-column-under-conditions-on-a-different-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!