How to delete duplicate rows in SQL Server?

后端 未结 23 1288
长情又很酷
长情又很酷 2020-11-22 00:58

How can I delete duplicate rows where no unique row id exists?

My table is

col1  col2 col3 col4 col5 col6 col7
john  1          


        
相关标签:
23条回答
  • 2020-11-22 01:37

    It can be done by many ways in sql server the most simplest way to do so is: Insert the distinct rows from the duplicate rows table to new temporary table. Then delete all the data from duplicate rows table then insert all data from temporary table which has no duplicates as shown below.

    select distinct * into #tmp From table
       delete from table
       insert into table
       select * from #tmp drop table #tmp
    
       select * from table
    

    Delete duplicate rows using Common Table Expression(CTE)

    With CTE_Duplicates as 
    (select id,name , row_number() 
    over(partition by id,name order by id,name ) rownumber  from table  ) 
    delete from CTE_Duplicates where rownumber!=1
    
    0 讨论(0)
  • 2020-11-22 01:39

    should work equally as in other SQL servers, like Postgres:

    DELETE FROM table
    WHERE id NOT IN (
       select min(id) from table
       group by col1, col2, col3, col4, col5, col6, col7
    )
    
    0 讨论(0)
  • 2020-11-22 01:40

    I like CTEs and ROW_NUMBER as the two combined allow us to see which rows are deleted (or updated), therefore just change the DELETE FROM CTE... to SELECT * FROM CTE:

    WITH CTE AS(
       SELECT [col1], [col2], [col3], [col4], [col5], [col6], [col7],
           RN = ROW_NUMBER()OVER(PARTITION BY col1 ORDER BY col1)
       FROM dbo.Table1
    )
    DELETE FROM CTE WHERE RN > 1
    

    DEMO (result is different; I assume that it's due to a typo on your part)

    COL1    COL2    COL3    COL4    COL5    COL6    COL7
    john    1        1       1       1       1       1
    sally   2        2       2       2       2       2
    

    This example determines duplicates by a single column col1 because of the PARTITION BY col1. If you want to include multiple columns simply add them to the PARTITION BY:

    ROW_NUMBER()OVER(PARTITION BY Col1, Col2, ... ORDER BY OrderColumn)
    
    0 讨论(0)
  • 2020-11-22 01:41

    You need to group by the duplicate records according to the field(s), then hold one of the records and delete the rest. For example:

    DELETE prg.Person WHERE Id IN (
    SELECT dublicateRow.Id FROM
    (
    select MIN(Id) MinId, NationalCode
     from  prg.Person group by NationalCode  having count(NationalCode ) > 1
     ) GroupSelect
     JOIN  prg.Person dublicateRow ON dublicateRow.NationalCode = GroupSelect.NationalCode 
     WHERE dublicateRow.Id <> GroupSelect.MinId)
    
    0 讨论(0)
  • 2020-11-22 01:45

    If you have no references, like foreign keys, you can do this. I do it a lot when testing proofs of concept and the test data gets duplicated.

    SELECT DISTINCT [col1],[col2],[col3],[col4],[col5],[col6],[col7]
    
    INTO [newTable]
    
    FROM [oldTable]
    

    Go into the object explorer and delete the old table.

    Rename the new table with the old table's name.

    0 讨论(0)
  • 2020-11-22 01:46

    With reference to https://support.microsoft.com/en-us/help/139444/how-to-remove-duplicate-rows-from-a-table-in-sql-server

    The idea of removing duplicate involves

    • a) Protecting those rows that are not duplicate
    • b) Retain one of the many rows that qualified together as duplicate.

    Step-by-step

    • 1) First identify the rows those satisfy the definition of duplicate and insert them into temp table, say #tableAll .
    • 2) Select non-duplicate(single-rows) or distinct rows into temp table say #tableUnique.
    • 3) Delete from source table joining #tableAll to delete the duplicates.
    • 4) Insert into source table all the rows from #tableUnique.
    • 5) Drop #tableAll and #tableUnique
    0 讨论(0)
提交回复
热议问题