Deleting duplicate row that has earliest date

后端未结

关注

 1  1536

予麋鹿

I have a table called PF_temo that has the following structure:

firstname
middlename
lastname
DOB
address

相关标签:

1条回答

你的背包

2021-01-25 01:25
Try this:
```
delete a
from PF_Temp a
inner join PF_Temp b 
on  b.firstname = a.firstname 
and b.middlename = a.middlename
and b.lastname = a.lastname
and b.DOB = a.DOB
and b.address = a.address
and b.city = a.city
and b.state = a.state
and b.phone = a.phone
and b.validitydate > a.validitydate
```
Example at SQL Fiddle.

The above works by:
- joining on all matching fields (except validity date), thus capturing in a all records which have duplicates. At this stage we capture all records, since the record in a would match with itself in b.
- By specifying that the validitydate in b must be greater than that in a we both avoid the above issue of the record being the same (since if it were the same record, the validity date would be the same), and also ensuring that there's no match if the record in a is the most recent; since there will be no match in b (i.e. no record with a greater validity date).
- we then delete every record which was returned by a; i.e. every record which has a duplicate with a later validity date.
If you want to only delete those duplicates with a specific last name, you do exactly what you said above; i.e. add the line where a.LastName like 'A%'.

Update

You mention that some columns may contain nulls. Here's a revised version of the above to take into account that null != null.
```
delete a
from PF_Temp a
inner join PF_Temp b 
on  ((b.firstname = a.firstname) or (b.firstname is null and a.firstname is null))
and ((b.middlename = a.middlename) or (b.middlename is null and a.middlename is null))
and ((b.lastname = a.lastname) or (b.lastname is null and a.lastname is null))
and ((b.DOB = a.DOB) or (b.DOB is null and a.DOB is null))
and ((b.address = a.address) or (b.address is null and a.address is null))
and ((b.city = a.city) or (b.city is null and a.city is null))
and ((b.state = a.state) or (b.state is null and a.state is null))
and ((b.phone = a.phone) or (b.phone is null and a.phone is null))
and b.validitydate > a.validitydate
```
An alternative to the above would be on coalesce(b.firstname,'') = coalesce(a.firstname) (repeating that pattern for all other matching fields); though that would mean that nulls and blanks were treated the same, and wouldn't perform quite so well.

Alternative Method

A different approach, which is more forgiving of nulls, is to use a subquery to pull back all values, numbering each set with matching values, starting at 1 for the most recent validity date. We then delete all those rows which came back with numbers higher than 1; i.e. any which are duplicates with earlier validity dates.
```
delete TheDeletables
from 
(
    select *
    , row_number() over (
        partition by 
         firstname 
        , middlename 
        , lastname 
        , DOB 
        , address  
        , city 
        , state 
        , phone 
        order by validitydate desc
    ) rowid
    from PF_Temp
) TheDeletables
where rowid > 1;
```
Demo SQL Fiddle.
0 讨论(0)
发布评论:

提交评论
- 加载中...