Deleting duplicate row that has earliest date

后端 未结 1 1536
予麋鹿
予麋鹿 2021-01-25 01:00

I have a table called PF_temo that has the following structure:

  • firstname
  • middlename
  • lastname
  • DOB
  • address
相关标签:
1条回答
  • 2021-01-25 01:25

    Try this:

    delete a
    from PF_Temp a
    inner join PF_Temp b 
    on  b.firstname = a.firstname 
    and b.middlename = a.middlename
    and b.lastname = a.lastname
    and b.DOB = a.DOB
    and b.address = a.address
    and b.city = a.city
    and b.state = a.state
    and b.phone = a.phone
    and b.validitydate > a.validitydate
    

    Example at SQL Fiddle.

    The above works by:

    • joining on all matching fields (except validity date), thus capturing in a all records which have duplicates. At this stage we capture all records, since the record in a would match with itself in b.
    • By specifying that the validitydate in b must be greater than that in a we both avoid the above issue of the record being the same (since if it were the same record, the validity date would be the same), and also ensuring that there's no match if the record in a is the most recent; since there will be no match in b (i.e. no record with a greater validity date).
    • we then delete every record which was returned by a; i.e. every record which has a duplicate with a later validity date.

    If you want to only delete those duplicates with a specific last name, you do exactly what you said above; i.e. add the line where a.LastName like 'A%'.


    Update

    You mention that some columns may contain nulls. Here's a revised version of the above to take into account that null != null.

    delete a
    from PF_Temp a
    inner join PF_Temp b 
    on  ((b.firstname = a.firstname) or (b.firstname is null and a.firstname is null))
    and ((b.middlename = a.middlename) or (b.middlename is null and a.middlename is null))
    and ((b.lastname = a.lastname) or (b.lastname is null and a.lastname is null))
    and ((b.DOB = a.DOB) or (b.DOB is null and a.DOB is null))
    and ((b.address = a.address) or (b.address is null and a.address is null))
    and ((b.city = a.city) or (b.city is null and a.city is null))
    and ((b.state = a.state) or (b.state is null and a.state is null))
    and ((b.phone = a.phone) or (b.phone is null and a.phone is null))
    and b.validitydate > a.validitydate
    

    An alternative to the above would be on coalesce(b.firstname,'') = coalesce(a.firstname) (repeating that pattern for all other matching fields); though that would mean that nulls and blanks were treated the same, and wouldn't perform quite so well.


    Alternative Method

    A different approach, which is more forgiving of nulls, is to use a subquery to pull back all values, numbering each set with matching values, starting at 1 for the most recent validity date. We then delete all those rows which came back with numbers higher than 1; i.e. any which are duplicates with earlier validity dates.

    delete TheDeletables
    from 
    (
        select *
        , row_number() over (
            partition by 
             firstname 
            , middlename 
            , lastname 
            , DOB 
            , address  
            , city 
            , state 
            , phone 
            order by validitydate desc
        ) rowid
        from PF_Temp
    ) TheDeletables
    where rowid > 1;
    

    Demo SQL Fiddle.

    0 讨论(0)
提交回复
热议问题