How to delete duplicate rows in SQL Server?

后端 未结 23 1446
长情又很酷
长情又很酷 2020-11-22 00:58

How can I delete duplicate rows where no unique row id exists?

My table is

col1  col2 col3 col4 col5 col6 col7
john  1          


        
相关标签:
23条回答
  • 2020-11-22 01:46

    Another way of removing dublicated rows without loosing information in one step is like following:

    delete from dublicated_table t1 (nolock)
    join (
        select t2.dublicated_field
        , min(len(t2.field_kept)) as min_field_kept
        from dublicated_table t2 (nolock)
        group by t2.dublicated_field having COUNT(*)>1
    ) t3 
    on t1.dublicated_field=t3.dublicated_field 
        and len(t1.field_kept)=t3.min_field_kept
    
    0 讨论(0)
  • 2020-11-22 01:48

    Try to Use:

    SELECT linkorder
        ,Row_Number() OVER (
            PARTITION BY linkorder ORDER BY linkorder DESC
            ) AS RowNum
    FROM u_links
    

    0 讨论(0)
  • 2020-11-22 01:48

    Oh wow, i feel so stupid by ready all this answers, they are like experts' answer with all CTE and temp table and etc.

    And all I did to get it working was simply aggregated the ID column by using MAX.

    DELETE FROM table WHERE col1 IN (
        SELECT MAX(id) FROM table GROUP BY id HAVING ( COUNT(col1) > 1 )
    )
    

    NOTE: you might need to run it multiple time to remove duplicate as this will only delete one set of duplicate rows at a time.

    0 讨论(0)
  • 2020-11-22 01:51

    Deleting duplicates from a huge(several millions of records) table might take long time . I suggest that you do a bulk insert into a temp table of the selected rows rather than deleting.

    --REWRITING YOUR CODE(TAKE NOTE OF THE 3RD LINE) WITH CTE AS(SELECT NAME,ROW_NUMBER() 
    OVER (PARTITION BY NAME ORDER BY NAME) ID FROM @TB) SELECT * INTO #unique_records FROM 
    CTE WHERE ID =1;
    
    0 讨论(0)
  • 2020-11-22 01:51

    If you have the ability to add a column to the table temporarily, this was a solution that worked for me:

    ALTER TABLE dbo.DUPPEDTABLE ADD RowID INT NOT NULL IDENTITY(1,1)
    

    Then perform a DELETE using a combination of MIN and GROUP BY

    DELETE b
    FROM dbo.DUPPEDTABLE b
    WHERE b.RowID NOT IN (
                         SELECT MIN(RowID) AS RowID
                         FROM dbo.DUPPEDTABLE a WITH (NOLOCK)
                         GROUP BY a.ITEM_NUMBER,
                                  a.CHARACTERISTIC,
                                  a.INTVALUE,
                                  a.FLOATVALUE,
                                  a.STRINGVALUE
                     );
    

    Verify that the DELETE performed correctly:

    SELECT a.ITEM_NUMBER,
        a.CHARACTERISTIC,
        a.INTVALUE,
        a.FLOATVALUE,
        a.STRINGVALUE, COUNT(*)--MIN(RowID) AS RowID
    FROM dbo.DUPPEDTABLE a WITH (NOLOCK)
    GROUP BY a.ITEM_NUMBER,
        a.CHARACTERISTIC,
        a.INTVALUE,
        a.FLOATVALUE,
        a.STRINGVALUE
    ORDER BY COUNT(*) DESC 
    

    The result should have no rows with a count greater than 1. Finally, remove the rowid column:

    ALTER TABLE dbo.DUPPEDTABLE DROP COLUMN RowID;
    
    0 讨论(0)
  • 2020-11-22 01:52

    Microsoft has a vey ry neat guide on how to remove duplicates. Check out http://support.microsoft.com/kb/139444

    In brief, here is the easiest way to delete duplicates when you have just a few rows to delete:

    SET rowcount 1;
    DELETE FROM t1 WHERE myprimarykey=1;
    

    myprimarykey is the identifier for the row.

    I set rowcount to 1 because I only had two rows that were duplicated. If I had had 3 rows duplicated then I would have set rowcount to 2 so that it deletes the first two that it sees and only leaves one in table t1.

    Hope it helps anyone

    0 讨论(0)
提交回复
热议问题