I\'ve got a table that has rows that are unique except for one value in one column (let\'s call it \'Name\'). Another column is \'Date\' which is the date it was added to th
Find duplicates and delete oldest one
Here is the Code
create table #Product (
ID int identity(1, 1) primary key,
Name varchar(800),
DateAdded datetime default getdate()
)
insert #Product(Name) select 'Chocolate'
insert #Product(Name,DateAdded) select 'Candy', GETDATE() + 1
insert #Product(Name,DateAdded) select 'Chocolate', GETDATE() + 5
select * from #Product
;with Ranked as (
select ID,
dense_rank()
over (partition by Name order by DateAdded desc) as DupeCount
from #Product P
)
delete R
from Ranked R
where R.DupeCount > 1
select * from #Product
I Just googled up and found this https://www.sqlshack.com/different-ways-to-sql-delete-duplicate-rows-from-a-sql-table/
this one seems the easiest to read/understand to me:
DELETE FROM [SampleDB].[dbo].[Employee]
WHERE ID NOT IN
(
SELECT MAX(ID) AS MaxRecordID
FROM [SampleDB].[dbo].[Employee]
GROUP BY [FirstName],
[LastName],
[Country]
);
in your scenario you can just group by name and select the max date instead of Id
You could probably achieve this with a self-join and a IS NOT NULL.
Joining on DELETE queries can be a little dangerous, because the more complex it is the more the risk of deleting more than you intend to in some circumstances.
But I would approach it like.
DELETE
a.*
FROM
mytable AS a
LEFT JOIN mytable AS b ON
b.date > a.date
AND (b.name=a.name OR (b.date = a.date AND b.rowid>a.rowid))
WHERE
AND b.rowid IS NOT NULL
The join and the IS NOT NULL finds every row for which there exists a newer row with the same name. It also handles the case of two rows with the same date correctly - if they have the same date, then it goes by rowid (whatever that is).
Hopefully something like this works.
delete from table a1 where exists (select * from table a2 where a2.name = a1.name and a2.date > a1.date)