I have a table called PF_temo
that has the following structure:
Try this:
delete a
from PF_Temp a
inner join PF_Temp b
on b.firstname = a.firstname
and b.middlename = a.middlename
and b.lastname = a.lastname
and b.DOB = a.DOB
and b.address = a.address
and b.city = a.city
and b.state = a.state
and b.phone = a.phone
and b.validitydate > a.validitydate
Example at SQL Fiddle.
The above works by:
a
all records which have duplicates. At this stage we capture all records, since the record in a
would match with itself in b
.validitydate
in b
must be greater than that in a
we both avoid the above issue of the record being the same (since if it were the same record, the validity date would be the same), and also ensuring that there's no match if the record in a
is the most recent; since there will be no match in b
(i.e. no record with a greater validity date).a
; i.e. every record which has a duplicate with a later validity date.If you want to only delete those duplicates with a specific last name, you do exactly what you said above; i.e. add the line where a.LastName like 'A%'
.
Update
You mention that some columns may contain null
s. Here's a revised version of the above to take into account that null != null
.
delete a
from PF_Temp a
inner join PF_Temp b
on ((b.firstname = a.firstname) or (b.firstname is null and a.firstname is null))
and ((b.middlename = a.middlename) or (b.middlename is null and a.middlename is null))
and ((b.lastname = a.lastname) or (b.lastname is null and a.lastname is null))
and ((b.DOB = a.DOB) or (b.DOB is null and a.DOB is null))
and ((b.address = a.address) or (b.address is null and a.address is null))
and ((b.city = a.city) or (b.city is null and a.city is null))
and ((b.state = a.state) or (b.state is null and a.state is null))
and ((b.phone = a.phone) or (b.phone is null and a.phone is null))
and b.validitydate > a.validitydate
An alternative to the above would be on coalesce(b.firstname,'') = coalesce(a.firstname)
(repeating that pattern for all other matching fields); though that would mean that nulls and blanks were treated the same, and wouldn't perform quite so well.
Alternative Method
A different approach, which is more forgiving of nulls, is to use a subquery to pull back all values, numbering each set with matching values, starting at 1 for the most recent validity date. We then delete all those rows which came back with numbers higher than 1; i.e. any which are duplicates with earlier validity dates.
delete TheDeletables
from
(
select *
, row_number() over (
partition by
firstname
, middlename
, lastname
, DOB
, address
, city
, state
, phone
order by validitydate desc
) rowid
from PF_Temp
) TheDeletables
where rowid > 1;
Demo SQL Fiddle.