how to detect/remove duplicate entries from a database in a table where there is no primary key ?
[If we use \'DISTINCT\' how do we know which record is the correct
If we use 'DISTINCT' how do we know which record is the correct one and duplicate one?
If you have duplicate rows then doesn't matter which duplicate is picked because they are all the same!
I guess when you say "there is no primary key" that you actually mean there is no simple single-column 'surrogate' candidate key such as an incrementing sequence of integers, preferably with no gaps, but that there is a multi-column compound 'natural' candidate key (though does not comprise all the columns).
If this is the case, you'd look for something to break ties e.g. a column named DateChanged
as per @Dave's answer. Otherwise, you need to pick am arbitrary row e.g. the answer by @Surfer513 does this using the ROW_NUMBER()
windowed function over (YourFirstPossibleDuplicateField, YourSecondPossibleDuplicateField)
(i.e. your natural key) then picking the duplicate that got arbitrarily assigned the row number 1
.
delete f
from
(
select ROW_NUMBER()
over (partition by
YourFirstPossibleDuplicateField,
YourSecondPossibleDuplicateField
order by WhateverFieldYouWantSortedBy) as DelId
from YourTable
) as f
where DelId > 1
I created a view where DISTINCT actually was not a part of the query, but PARTITION. I needed the most recent entry to records with the same Ordernum and RecordType fields, discarding the others. The partitions are ordered by date, and then the top row is selected, like this:
SELECT *, ROW_NUMBER()
OVER (PARTITION BY OrderNum, RecordType ORDER BY DateChanged DESC) rn
FROM HistoryTable SELECT * FROM q WHERE rn = 1