Oracle sql: filtering repeated rows that only differ by a tiny amount of time

问题

I have an Oracle table with event alarms, and due to a weird and unknown -to me- condition alarms get repeated sometimes so I was asked to create a service that will remove repeated alarms from that Oracle table.

The condition to consider an alarm (a row in the table) as repeated is that there is another one with the same exact PKN_EVENTNAME and RECEIVEDDATE only differs from the previous one by a tiny amount of time (let's say 10 seconds for example -up or down-).

What I'm trying to do first is to create an Oracle sql statement that will group all alarms by PKN_EVENTNAME separating in each group the ones that are repeated (for later deletion).

I think I'm on the way, but I'm stuck.

¿Any help please?

My sql so far:

select t1.ID, t1.PKN_EVENTNAME, t1.RECEIVEDDATE 
from PARQUIMETERS_ALARMS t1 
where 
  exists
     (select 'x' 
      from   PARQUIMETERS_ALARMS t2 
      where  t1.id <> t2.id and                                              -- Not the same row
             trunc(t2.RECEIVEDDATE) = trunc(t1.RECEIVEDDATE)                 -- Same date
             and abs(t1.RECEIVEDDATE - t2.RECEIVEDDATE) * 24 * 60 * 60 < 10)  -- < 10 sec

Edit 1:

With @Tejash corrections I see different results in Visual Studio Oracle SQL browser, but I'm not being able to understand them. I don't see clear if results are already records to be deleted (repeated alarms) or what.

回答1:

You can utilize range clause of analytic functions:

with dups as (
  select t1.*
       , row_number() over (
           partition by PKN_EVENTNAME, RECEIVEDDATE
           order by id
         ) as dup
  from PARQUIMETERS_ALARMS t1
), nodups as (
  select * from dups where dup = 1
), t as (
  select nodups.ID, nodups.PKN_EVENTNAME, nodups.RECEIVEDDATE
       , count(*) over (
           partition by nodups.PKN_EVENTNAME
           order by nodups.RECEIVEDDATE
           range between interval '10' second preceding and current row
         ) as cnt
  from nodups
)
select * from t where cnt = 1

(UPDATED: CTEs dups and nodups were added after OP shown in comment there are duplicate tuples (PKN_EVENTNAME, RECEIVEDDATE).)

Explanation: after cleaning data passing through nodups CTE, the where condition filters only rows for which there is only single row in last 10 s (which is obviously the current row).

回答2:

You are missing t1.PKN_EVENTNAME = t2.PKN_EVENTNAME in your exists condition and there is one irrelevant condition in your exists clause.

Your query should look like this:

select t1.ID, t1.PKN_EVENTNAME, t1.RECEIVEDDATE 
from PARQUIMETERS_ALARMS t1 
where 
  exists
     (select 'x' 
      from   PARQUIMETERS_ALARMS t2 
      where  t1.id <> t2.id   -- Not the same row                                         
             --trunc(t2.RECEIVEDDATE) = trunc(t1.RECEIVEDDATE)   -- this is not needed
             and t1.PKN_EVENTNAME = t2.PKN_EVENTNAME -- added this
             and abs(t1.RECEIVEDDATE - t2.RECEIVEDDATE) * 24 * 60 * 60 < 10) -- < 5 sec

回答3:

You can certainly write this using an exists. It is likely to be more efficient, however, to use an analytic function. Something like this

with alarms as (
  select pa.*,
         lag(pa.RECEIVEDDATE) over (partition by pa.pkn_eventName
                                        order by pa.recievedDate) prior_receivedDate
    from PARQUIMETERS_ALARMS pa
)
select *
  from alarms
 where receivedDate - prior_receivedDate <= interval '10' second;

Note that I'm factoring out the alarms subquery here so you can easily run that separately and see the data set with the additional prior_receivedDate data in it before you apply the filter condition looking for repeated rows. That can often be useful in debugging/ visualizing the data. But you're free to write the query with an inline view if that's easier for you.

来源：https://stackoverflow.com/questions/62770260/oracle-sql-filtering-repeated-rows-that-only-differ-by-a-tiny-amount-of-time

标签

sql

Oracle

datediff

date-difference