How can I delete duplicate rows in a table

后端 未结 13 1352
情歌与酒
情歌与酒 2020-12-08 22:30

I have a table with say 3 columns. There\'s no primary key so there can be duplicate rows. I need to just keep one and delete the others. Any idea how to do this is Sql Serv

相关标签:
13条回答
  • 2020-12-08 23:02

    This is a way to do it with Common Table Expressions, CTE. It involves no loops, no new columns or anything and won't cause any unwanted triggers to fire (due to deletes+inserts).

    Inspired by this article.

    CREATE TABLE #temp (i INT)
    
    INSERT INTO #temp VALUES (1)
    INSERT INTO #temp VALUES (1)
    INSERT INTO #temp VALUES (2)
    INSERT INTO #temp VALUES (3)
    INSERT INTO #temp VALUES (3)
    INSERT INTO #temp VALUES (4)
    
    SELECT * FROM #temp
    
    ;
    WITH [#temp+rowid] AS
    (SELECT ROW_NUMBER() OVER (ORDER BY i ASC) AS ROWID, * FROM #temp)
    DELETE FROM [#temp+rowid] WHERE rowid IN 
    (SELECT MIN(rowid) FROM [#temp+rowid] GROUP BY i HAVING COUNT(*) > 1)
    
    SELECT * FROM #temp
    
    DROP TABLE #temp   
    
    0 讨论(0)
  • 2020-12-08 23:02

    This is a tough situation to be in. Without knowing your particular situation (table size etc) I think that your best shot is to add an identity column, populate it and then delete according to it. You may remove the column later but I would suggest that you should keep it as it is really a good thing to have in the table

    0 讨论(0)
  • 2020-12-08 23:03

    The following example works as well when your PK is just a subset of all table columns.

    (Note: I like the approach with inserting another surrogate id column more. But maybe this solution comes handy as well.)

    First find the duplicate rows:

    SELECT col1, col2, count(*)
    FROM t1
    GROUP BY col1, col2
    HAVING count(*) > 1
    

    If there are only few, you can delete them manually:

    set rowcount 1
    delete from t1
    where col1=1 and col2=1
    

    The value of "rowcount" should be n-1 times the number of duplicates. In this example there are 2 dulpicates, therefore rowcount is 1. If you get several duplicate rows, you have to do this for every unique primary key.

    If you have many duplicates, then copy every key once into anoher table:

    SELECT col1, col2, col3=count(*)
    INTO holdkey
    FROM t1
    GROUP BY col1, col2
    HAVING count(*) > 1
    

    Then copy the keys, but eliminate the duplicates.

    SELECT DISTINCT t1.*
    INTO holddups
    FROM t1, holdkey
    WHERE t1.col1 = holdkey.col1
    AND t1.col2 = holdkey.col2
    

    In your keys you have now unique keys. Check if you don't get any result:

    SELECT col1, col2, count(*)
    FROM holddups
    GROUP BY col1, col2
    

    Delete the duplicates from the original table:

    DELETE t1
    FROM t1, holdkey
    WHERE t1.col1 = holdkey.col1
    AND t1.col2 = holdkey.col2
    

    Insert the original rows:

    INSERT t1 SELECT * FROM holddups
    

    btw and for completeness: In Oracle there is a hidden field you could use (rowid):

    DELETE FROM our_table
    WHERE rowid not in
    (SELECT MIN(rowid)
    FROM our_table
    GROUP BY column1, column2, column3... ;
    

    see: Microsoft Knowledge Site

    0 讨论(0)
  • 2020-12-08 23:07

    Manrico Corazzi - I specialize in Oracle, not MS SQL, so you'll have to tell me if this is possible as a performance boost:-

    1. Leave the same as your first step - insert distinct values into TABLE2 from TABLE1.
    2. Drop TABLE1. (Drop should be faster than delete I assume, much as truncate is faster than delete).
    3. Rename TABLE2 as TABLE1 (saves you time, as you're renaming an object rather than copying data from one table to another).
    0 讨论(0)
  • 2020-12-08 23:09

    Can you add a primary key identity field to the table?

    0 讨论(0)
  • 2020-12-08 23:13

    How about:

    select distinct * into #t from duplicates_tbl
    
    truncate duplicates_tbl
    
    insert duplicates_tbl select * from #t
    
    drop table #t
    
    0 讨论(0)
提交回复
热议问题