How can I find duplicate entries and delete the oldest ones in SQL?

后端 未结 4 1734
轻奢々
轻奢々 2020-12-18 04:13

I\'ve got a table that has rows that are unique except for one value in one column (let\'s call it \'Name\'). Another column is \'Date\' which is the date it was added to th

相关标签:
4条回答
  • 2020-12-18 04:49

    Find duplicates and delete oldest one

    alt text

    Here is the Code

    create table #Product (
        ID      int identity(1, 1) primary key,
        Name        varchar(800),
        DateAdded   datetime default getdate()
    )
    
    insert  #Product(Name) select 'Chocolate'
    insert  #Product(Name,DateAdded) select 'Candy', GETDATE() + 1
    insert  #Product(Name,DateAdded) select 'Chocolate', GETDATE() + 5
    select * from #Product
    
    ;with Ranked as (
        select  ID, 
            dense_rank() 
            over (partition by Name order by DateAdded desc) as DupeCount
        from    #Product P
    )
    delete  R
    from    Ranked R
    where   R.DupeCount > 1
    
    select * from #Product
    
    0 讨论(0)
  • 2020-12-18 04:49

    I Just googled up and found this https://www.sqlshack.com/different-ways-to-sql-delete-duplicate-rows-from-a-sql-table/

    this one seems the easiest to read/understand to me:

    DELETE FROM [SampleDB].[dbo].[Employee]
        WHERE ID NOT IN
        (
            SELECT MAX(ID) AS MaxRecordID
            FROM [SampleDB].[dbo].[Employee]
            GROUP BY [FirstName], 
                     [LastName], 
                     [Country]
        );
    

    in your scenario you can just group by name and select the max date instead of Id

    0 讨论(0)
  • 2020-12-18 04:50

    You could probably achieve this with a self-join and a IS NOT NULL.

    Joining on DELETE queries can be a little dangerous, because the more complex it is the more the risk of deleting more than you intend to in some circumstances.

    But I would approach it like.

    DELETE
      a.*
    FROM
      mytable AS a
      LEFT JOIN mytable AS b ON
        b.date > a.date
        AND (b.name=a.name OR (b.date = a.date AND b.rowid>a.rowid))
    WHERE
      AND b.rowid IS NOT NULL
    

    The join and the IS NOT NULL finds every row for which there exists a newer row with the same name. It also handles the case of two rows with the same date correctly - if they have the same date, then it goes by rowid (whatever that is).

    Hopefully something like this works.

    0 讨论(0)
  • 2020-12-18 04:58

    delete from table a1 where exists (select * from table a2 where a2.name = a1.name and a2.date > a1.date)

    0 讨论(0)
提交回复
热议问题