How to delete duplicate rows in SQL Server?

后端 未结 23 1376
长情又很酷
长情又很酷 2020-11-22 00:58

How can I delete duplicate rows where no unique row id exists?

My table is

col1  col2 col3 col4 col5 col6 col7
john  1          


        
相关标签:
23条回答
  • 2020-11-22 01:31
    DELETE from search
    where id not in (
       select min(id) from search
       group by url
       having count(*)=1
    
       union
    
       SELECT min(id) FROM search
       group by url
       having count(*) > 1
    )
    
    0 讨论(0)
  • 2020-11-22 01:31

    Please see the below way of deletion too.

    Declare @table table
    (col1 varchar(10),col2 int,col3 int, col4 int, col5 int, col6 int, col7 int)
    Insert into @table values 
    ('john',1,1,1,1,1,1),
    ('john',1,1,1,1,1,1),
    ('sally',2,2,2,2,2,2),
    ('sally',2,2,2,2,2,2)
    

    Created a sample table named @table and loaded it with given data.

    Delete  aliasName from (
    Select  *,
            ROW_NUMBER() over (Partition by col1,col2,col3,col4,col5,col6,col7 order by col1) as rowNumber
    From    @table) aliasName 
    Where   rowNumber > 1
    
    Select * from @table
    

    Note: If you are giving all columns in the Partition by part, then order by do not have much significance.

    I know, the question is asked three years ago, and my answer is another version of what Tim has posted, But posting just incase it is helpful for anyone.

    0 讨论(0)
  • 2020-11-22 01:31

    There are two solutions in mysql:

    A) Delete duplicate rows using DELETE JOIN statement

    DELETE t1 FROM contacts t1
    INNER JOIN contacts t2 
    WHERE 
        t1.id < t2.id AND 
        t1.email = t2.email;
    

    This query references the contacts table twice, therefore, it uses the table alias t1 and t2.

    The output is:

    1 Query OK, 4 rows affected (0.10 sec)

    In case you want to delete duplicate rows and keep the lowest id, you can use the following statement:

    DELETE c1 FROM contacts c1
    INNER JOIN contacts c2 
    WHERE
        c1.id > c2.id AND 
        c1.email = c2.email;
    

       

    B) Delete duplicate rows using an intermediate table

    The following shows the steps for removing duplicate rows using an intermediate table:

        1. Create a new table with the structure the same as the original table that you want to delete duplicate rows.

        2. Insert distinct rows from the original table to the immediate table.

        3. Insert distinct rows from the original table to the immediate table.

     

    Step 1. Create a new table whose structure is the same as the original table:

    CREATE TABLE source_copy LIKE source;
    

    Step 2. Insert distinct rows from the original table to the new table:

    INSERT INTO source_copy
    SELECT * FROM source
    GROUP BY col; -- column that has duplicate values
    

    Step 3. drop the original table and rename the immediate table to the original one

    DROP TABLE source;
    ALTER TABLE source_copy RENAME TO source;
    

    Source: http://www.mysqltutorial.org/mysql-delete-duplicate-rows/

    0 讨论(0)
  • 2020-11-22 01:31

    After trying the suggested solution above, that works for small medium tables. I can suggest that solution for very large tables. since it runs in iterations.

    1. Drop all dependency views of the LargeSourceTable
    2. you can find the dependecies by using sql managment studio, right click on the table and click "View Dependencies"
    3. Rename the table:
    4. sp_rename 'LargeSourceTable', 'LargeSourceTable_Temp'; GO
    5. Create the LargeSourceTable again, but now, add a primary key with all the columns that define the duplications add WITH (IGNORE_DUP_KEY = ON)
    6. For example:

      CREATE TABLE [dbo].[LargeSourceTable] ( ID int IDENTITY(1,1), [CreateDate] DATETIME CONSTRAINT [DF_LargeSourceTable_CreateDate] DEFAULT (getdate()) NOT NULL, [Column1] CHAR (36) NOT NULL, [Column2] NVARCHAR (100) NOT NULL, [Column3] CHAR (36) NOT NULL, PRIMARY KEY (Column1, Column2) WITH (IGNORE_DUP_KEY = ON) ); GO

    7. Create again the views that you dropped in the first place for the new created table

    8. Now, Run the following sql script, you will see the results in 1,000,000 rows per page, you can change the row number per page to see the results more often.

    9. Note, that I set the IDENTITY_INSERT on and off because one the columns contains auto incremental id, which I'm also copying

    SET IDENTITY_INSERT LargeSourceTable ON DECLARE @PageNumber AS INT, @RowspPage AS INT DECLARE @TotalRows AS INT declare @dt varchar(19) SET @PageNumber = 0 SET @RowspPage = 1000000 select @TotalRows = count (*) from LargeSourceTable_TEMP

    While ((@PageNumber - 1) * @RowspPage < @TotalRows )
    Begin
        begin transaction tran_inner
            ; with cte as
            (
                SELECT * FROM LargeSourceTable_TEMP ORDER BY ID
                OFFSET ((@PageNumber) * @RowspPage) ROWS
                FETCH NEXT @RowspPage ROWS ONLY
            )
    
            INSERT INTO LargeSourceTable 
            (
                 ID                     
                ,[CreateDate]       
                ,[Column1]   
                ,[Column2] 
                ,[Column3]       
            )       
            select 
                 ID                     
                ,[CreateDate]       
                ,[Column1]   
                ,[Column2] 
                ,[Column3]       
            from cte
    
        commit transaction tran_inner
    
        PRINT 'Page: ' + convert(varchar(10), @PageNumber)
        PRINT 'Transfered: ' + convert(varchar(20), @PageNumber * @RowspPage)
        PRINT 'Of: ' + convert(varchar(20), @TotalRows)
    
        SELECT @dt = convert(varchar(19), getdate(), 121)
        RAISERROR('Inserted on: %s', 0, 1, @dt) WITH NOWAIT
        SET @PageNumber = @PageNumber + 1
    End
    

    SET IDENTITY_INSERT LargeSourceTable OFF

    0 讨论(0)
  • 2020-11-22 01:32

    Without using CTE and ROW_NUMBER() you can just delete the records just by using group by with MAX function here is and example

    DELETE
    FROM MyDuplicateTable
    WHERE ID NOT IN
    (
    SELECT MAX(ID)
    FROM MyDuplicateTable
    GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)
    
    0 讨论(0)
  • 2020-11-22 01:36

    This might help in your case

    DELETE t1 FROM table t1 INNER JOIN table t2 WHERE t1.id > t2.id AND t1.col1 = t2.col1 
    
    0 讨论(0)
提交回复
热议问题