Interview - Detect/remove duplicate entries

后端 未结 3 533
感情败类
感情败类 2020-12-11 13:29

how to detect/remove duplicate entries from a database in a table where there is no primary key ?

[If we use \'DISTINCT\' how do we know which record is the correct

相关标签:
3条回答
  • 2020-12-11 14:03

    If we use 'DISTINCT' how do we know which record is the correct one and duplicate one?

    If you have duplicate rows then doesn't matter which duplicate is picked because they are all the same!

    I guess when you say "there is no primary key" that you actually mean there is no simple single-column 'surrogate' candidate key such as an incrementing sequence of integers, preferably with no gaps, but that there is a multi-column compound 'natural' candidate key (though does not comprise all the columns).

    If this is the case, you'd look for something to break ties e.g. a column named DateChanged as per @Dave's answer. Otherwise, you need to pick am arbitrary row e.g. the answer by @Surfer513 does this using the ROW_NUMBER() windowed function over (YourFirstPossibleDuplicateField, YourSecondPossibleDuplicateField) (i.e. your natural key) then picking the duplicate that got arbitrarily assigned the row number 1.

    0 讨论(0)
  • 2020-12-11 14:10
    delete f
    from
    (
        select ROW_NUMBER() 
            over (partition by 
                YourFirstPossibleDuplicateField,
                YourSecondPossibleDuplicateField
                order by WhateverFieldYouWantSortedBy) as DelId
        from YourTable
    ) as f
    where DelId > 1
    
    0 讨论(0)
  • 2020-12-11 14:26

    I created a view where DISTINCT actually was not a part of the query, but PARTITION. I needed the most recent entry to records with the same Ordernum and RecordType fields, discarding the others. The partitions are ordered by date, and then the top row is selected, like this:

    SELECT *, ROW_NUMBER() 
    OVER (PARTITION BY OrderNum, RecordType ORDER BY DateChanged DESC) rn
    FROM HistoryTable SELECT * FROM q WHERE rn = 1
    
    0 讨论(0)
提交回复
热议问题