Interview - Detect/remove duplicate entries

后端未结

关注

 3  533

感情败类

how to detect/remove duplicate entries from a database in a table where there is no primary key ?

[If we use \'DISTINCT\' how do we know which record is the correct

相关标签:

3条回答

南笙

2020-12-11 14:03

If we use 'DISTINCT' how do we know which record is the correct one and duplicate one?

If you have duplicate rows then doesn't matter which duplicate is picked because they are all the same!

I guess when you say "there is no primary key" that you actually mean there is no simple single-column 'surrogate' candidate key such as an incrementing sequence of integers, preferably with no gaps, but that there is a multi-column compound 'natural' candidate key (though does not comprise all the columns).

If this is the case, you'd look for something to break ties e.g. a column named DateChanged as per @Dave's answer. Otherwise, you need to pick am arbitrary row e.g. the answer by @Surfer513 does this using the ROW_NUMBER() windowed function over (YourFirstPossibleDuplicateField, YourSecondPossibleDuplicateField) (i.e. your natural key) then picking the duplicate that got arbitrarily assigned the row number 1.

0 讨论(0)
发布评论:

提交评论
- 加载中...

再見小時候

2020-12-11 14:10

delete f
from
(
    select ROW_NUMBER() 
        over (partition by 
            YourFirstPossibleDuplicateField,
            YourSecondPossibleDuplicateField
            order by WhateverFieldYouWantSortedBy) as DelId
    from YourTable
) as f
where DelId > 1

0 讨论(0)

庸人自扰

2020-12-11 14:26
I created a view where DISTINCT actually was not a part of the query, but PARTITION. I needed the most recent entry to records with the same Ordernum and RecordType fields, discarding the others. The partitions are ordered by date, and then the top row is selected, like this:
```
SELECT *, ROW_NUMBER() 
OVER (PARTITION BY OrderNum, RecordType ORDER BY DateChanged DESC) rn
FROM HistoryTable SELECT * FROM q WHERE rn = 1
```
0 讨论(0)
发布评论:

提交评论
- 加载中...