How to delete duplicate rows in SQL Server 2008?

前端 未结 4 1282
栀梦
栀梦 2021-01-06 18:26

How can I delete duplicate rows in SQL Server 2008?

相关标签:
4条回答
  • 2021-01-06 19:00

    Even though u dont have a Primary key, u can delete the duplicate data by the below code

    delete from (Tablename)
              where tablename.%%physloc%%
              NOT IN (select MIN(b.%%physloc%%)
              from tablename b
              group by b.Column1,b.column2,b.column3
              );
    
    0 讨论(0)
  • 2021-01-06 19:09

    Assuming you have a primary key called id and other columns are col2 ...coln, and that by "duplicate" rows you mean all rows where all column values except the PK are duplicated

    delete from A where id not in
    (select min(id) from A
    group by col2, col3, ...coln) as x
    

    i.e. group on all non-PK columns

    0 讨论(0)
  • 2021-01-06 19:18

    Add a primary key. Seriously, every table should have one. It can be an identity and you can ignore it, but make sure that every single table has a primary key defined.

    Imagine that you have a table like:

    create table T (
        id int identity,
        colA varchar(30) not null,
        colB varchar(30) not null
    )
    

    Then you can say something like:

    delete T
    from T t1
    where exists
    (select null from T t2
    where t2.colA = t1.colA
    and t2.colB = t1.colB
    and t2.id <> t1.id)
    

    Another trick is to select out the distinct records with the minimum id, and keep those:

    delete T
    where id not in
    (select min(id) from T
    group by colA, colB)
    

    (Sorry, I haven't tested these, but one of these ideas could lead you to your solution.)

    Note that if you don't have a primary key, the only other way to do this is to leverage a pseudo-column like ROWID -- but I'm not sure if SQL Server 2008 offers that idea.

    0 讨论(0)
  • 2021-01-06 19:19

    The simplest way is with a CTE (common table expression). I use this method when I've got raw data to import; the first thing I do to sanitize it is to assure there are no duplicates---that I've got some sort of unique handle to each row.

    Summary:

    WITH numbered AS (
        SELECT ROW_NUMBER() OVER(PARTITION BY [dupe-column-list] ORDER BY [dupe-column-list]) AS _dupe_num FROM [table-name] WHERE 1=1
    )
    DELETE FROM numbered WHERE _dupe_num > 1;
    

    The "dupe-column-list" part is where you list all of the columns involved where you wish values were unique. The ORDER BY is where you decide, within a set of duplicates, which row "wins" and which gets deleted. (The "WHERE 1=1" is just a personal habit.)

    The reason it works is because Sql Server keeps an internal, unique reference to each source row that's selected in the CTE. So when the DELETE is executed, it knows the exact row to be deleted, regardless what you put in your CTE's select-list. (If you're nervous, you could change the "DELETE" to "SELECT *", but since you've got duplicate rows, it's not going to help; if you could uniquely identify each row, you wouldn't be reading this.)

    Example:

    CREATE TABLE ##_dupes (col1 int, col2 int, col3 varchar(50));
    INSERT INTO ##_dupes 
        VALUES (1, 1, 'one,one')
            , (2, 2, 'two,two')
            , (3, 3, 'three,three')
            , (1, 1, 'one,one')
            , (1, 2, 'one,two')
            , (3, 3, 'three,three')
            , (1, 1, 'one,one')
            , (1, 2, '1,2');
    

    Of the 8 rows, you have 5 involved with duplicate problems; 3 rows need to get removed. You can see the problems with this:

    SELECT col1
        , col2
        , col3
        , COUNT(1) AS _total 
        FROM ##_dupes 
        WHERE 1=1 
        GROUP BY col1, col2, col3
        HAVING COUNT(1) > 1
        ORDER BY _total DESC;
    

    Now run the following query to remove the duplicates, leaving 1 row from each set of duplicates.

    WITH numbered AS (
        SELECT ROW_NUMBER() OVER(PARTITION BY col1, col2, col3 ORDER BY col1, col2, col3) AS _dupe_num FROM ##_dupes WHERE 1=1
    )
    DELETE FROM numbered WHERE _dupe_num > 1;
    

    You are now left with 5 rows, none of which are duplicated.

    0 讨论(0)
提交回复
热议问题