问题
I have an Oracle table with event alarms, and due to a weird and unknown -to me- condition alarms get repeated sometimes so I was asked to create a service that will remove repeated alarms from that Oracle table.
The condition to consider an alarm (a row in the table) as repeated is that there is another one with the same exact PKN_EVENTNAME and RECEIVEDDATE only differs from the previous one by a tiny amount of time (let's say 10 seconds for example -up or down-).
What I'm trying to do first is to create an Oracle sql statement that will group all alarms by PKN_EVENTNAME separating in each group the ones that are repeated (for later deletion).
I think I'm on the way, but I'm stuck.
¿Any help please?
My sql so far:
select t1.ID, t1.PKN_EVENTNAME, t1.RECEIVEDDATE
from PARQUIMETERS_ALARMS t1
where
exists
(select 'x'
from PARQUIMETERS_ALARMS t2
where t1.id <> t2.id and -- Not the same row
trunc(t2.RECEIVEDDATE) = trunc(t1.RECEIVEDDATE) -- Same date
and abs(t1.RECEIVEDDATE - t2.RECEIVEDDATE) * 24 * 60 * 60 < 10) -- < 10 sec
Edit 1:
With @Tejash corrections I see different results in Visual Studio Oracle SQL browser, but I'm not being able to understand them. I don't see clear if results are already records to be deleted (repeated alarms) or what.
回答1:
You can utilize range
clause of analytic functions:
with dups as (
select t1.*
, row_number() over (
partition by PKN_EVENTNAME, RECEIVEDDATE
order by id
) as dup
from PARQUIMETERS_ALARMS t1
), nodups as (
select * from dups where dup = 1
), t as (
select nodups.ID, nodups.PKN_EVENTNAME, nodups.RECEIVEDDATE
, count(*) over (
partition by nodups.PKN_EVENTNAME
order by nodups.RECEIVEDDATE
range between interval '10' second preceding and current row
) as cnt
from nodups
)
select * from t where cnt = 1
(UPDATED: CTEs dups
and nodups
were added after OP shown in comment there are duplicate tuples (PKN_EVENTNAME, RECEIVEDDATE)
.)
Explanation: after cleaning data passing through nodups
CTE, the where
condition filters only rows for which there is only single row in last 10 s (which is obviously the current row).
回答2:
You are missing t1.PKN_EVENTNAME = t2.PKN_EVENTNAME
in your exists
condition and there is one irrelevant condition in your exists
clause.
Your query should look like this:
select t1.ID, t1.PKN_EVENTNAME, t1.RECEIVEDDATE
from PARQUIMETERS_ALARMS t1
where
exists
(select 'x'
from PARQUIMETERS_ALARMS t2
where t1.id <> t2.id -- Not the same row
--trunc(t2.RECEIVEDDATE) = trunc(t1.RECEIVEDDATE) -- this is not needed
and t1.PKN_EVENTNAME = t2.PKN_EVENTNAME -- added this
and abs(t1.RECEIVEDDATE - t2.RECEIVEDDATE) * 24 * 60 * 60 < 10) -- < 5 sec
回答3:
You can certainly write this using an exists
. It is likely to be more efficient, however, to use an analytic function. Something like this
with alarms as (
select pa.*,
lag(pa.RECEIVEDDATE) over (partition by pa.pkn_eventName
order by pa.recievedDate) prior_receivedDate
from PARQUIMETERS_ALARMS pa
)
select *
from alarms
where receivedDate - prior_receivedDate <= interval '10' second;
Note that I'm factoring out the alarms
subquery here so you can easily run that separately and see the data set with the additional prior_receivedDate
data in it before you apply the filter condition looking for repeated rows. That can often be useful in debugging/ visualizing the data. But you're free to write the query with an inline view if that's easier for you.
来源:https://stackoverflow.com/questions/62770260/oracle-sql-filtering-repeated-rows-that-only-differ-by-a-tiny-amount-of-time