Drop rows identified within moving time window

和自甴很熟 提交于 2020-01-16 00:36:14

问题


I have a dataset of hospitalisations ('spells') - 1 row per spell. I want to drop any spells recorded within a week after another (there could be multiple) - the rationale being is that they're likely symptomatic of the same underlying cause. Here is some play data:

create table hif_user.rzb_recurse_src (
patid integer not null,
eventdate integer not null,
type smallint not null
);

insert into hif_user.rzb_recurse_src values (1,1,1);
insert into hif_user.rzb_recurse_src values (1,3,2);
insert into hif_user.rzb_recurse_src values (1,5,2);
insert into hif_user.rzb_recurse_src values (1,9,2);
insert into hif_user.rzb_recurse_src values (1,14,2);
insert into hif_user.rzb_recurse_src values (2,1,1);
insert into hif_user.rzb_recurse_src values (2,5,1);
insert into hif_user.rzb_recurse_src values (2,19,2);

Only spells of type 2 - within a week after any other - are to be dropped. Type 1 spells are to remain.

For patient 1, dates 1 & 9 should be kept. For patient 2, all rows should remain.

The issue is with patient 1. Spell date 9 is identified for dropping as it is close to spell date 5; however, as spell date 5 is close to spell date 1 is should be dropped therefore allowing spell date 9 to live...

So, it seems a recursive problem. However, I've not used recursive programming in SQL before and I'm struggling to really picture how to do it. Can anyone help? I should add that I'm using Teradata which has more restrictions than most with recursive SQL (only UNION ALL sets allowed I believe).


回答1:


It's a cursor logic, check one row after the other if it fits your rules, so recursion is the easiest (maybe the only) way to solve your problem.

To get a decent performance you need a Volatile Table to facilitate this row-by-row processing:

CREATE VOLATILE TABLE vt (patid, eventdate, exac_type, rn, startdate) AS
(
SELECT r.*
   ,ROW_NUMBER() -- needed to facilitate the join
    OVER (PARTITION BY patid ORDER BY eventdate) AS rn
FROM hif_user.rzb_recurse_src AS r
) WITH DATA ON COMMIT PRESERVE ROWS;

WITH RECURSIVE cte (patid, eventdate, exac_type, rn, startdate) AS
 (
   SELECT vt.*
     ,eventdate AS startdate 
   FROM vt
   WHERE rn = 1 -- start with the first row

   UNION ALL

   SELECT vt.*
     -- check if type = 1 or more than 7 days from the last eventdate
     ,CASE WHEN vt.eventdate > cte.startdate + 7  
             OR vt.exac_type = 1
           THEN vt.eventdate   -- new start date
           ELSE cte.startdate  -- keep old date
      END
   FROM vt JOIN cte
     ON vt.patid = cte.patid
    AND vt.rn = cte.rn + 1 -- proceed to next row
 )    
SELECT * 
FROM cte
WHERE eventdate - startdate = 0 -- only new start days
order by patid, eventdate



回答2:


I think the key to solving this is getting the first date more than 7 days from the current date and then doing a recursive subquery:

with rrs as (
      select rrs.*,
             (select min(rrs2.eventdate)
              from hif_user.rzb_recurse_src rrs2
              where rrs2.patid = rrs.patid and
                    rrs2.eventdate > rrs.eventdate + 7
             ) as eventdate7
      from hif_user.rzb_recurse_src rrs
     ),
     recursive cte as (
      select patid, min(eventdate) as eventdate, min(eventdate7) as eventdate7
      from hif_user.rzb_recurse_src rrs
      group by patid
      union all
      select cte.patid, cte.eventdate7, rrs.eventdate7
      from cte join
           hif_user.rzb_recurse_src rrs
           on rrs.patid = cte.patid and
              rrs.eventdate = cte.eventdate7
    )
select cte.patid, cte.eventdate
from cte;

If you want additional columns, then join in the original table at the last step.



来源:https://stackoverflow.com/questions/25015937/drop-rows-identified-within-moving-time-window

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!