SQL query to find duplicate rows, in any table

后端 未结 4 1885
隐瞒了意图╮
隐瞒了意图╮ 2021-01-18 18:12

I\'m looking for a schema-independent query. That is, if I have a users table or a purchases table, the query should be equally capable of catchin

相关标签:
4条回答
  • 2021-01-18 18:33

    I have done this using CTEs in SQL Server.

    Here is a sample on how to delete dupes but you should be able to adapt it easily to find dupes:

    WITH CTE (COl1, Col2, DuplicateCount)
    AS
    (
        SELECT COl1,Col2,
        ROW_NUMBER() OVER(PARTITION BY COl1,Col2 ORDER BY Col1) AS DuplicateCount
        FROM DuplicateRcordTable
    )
    DELETE
    FROM CTE
    WHERE DuplicateCount > 1
    GO
    

    Here is a link to an article where I got the SQL:

    http://blog.sqlauthority.com/2009/06/23/sql-server-2005-2008-delete-duplicate-rows/

    0 讨论(0)
  • 2021-01-18 18:45

    I'm still confused about what "detecting them might be" but I'll give it a shot.

    Excluding them is easy

    e.g.

    SELECT DISTINCT * FROM USERS
    

    However if you wanted to only include them and a duplicate is all the fields than you have to do

    SELECT 
       [Each and every field]
    FROM
       USERS
    GROUP BY
       [Each and every field]
    HAVING COUNT(*) > 1  
    

    You can't get away with just using (*) because you can't GROUP BY * so this requirement from your comments is difficult

    a schema-independent means I don't want to specify all of the columns in the query

    Unless that is you want to use dynamic SQL and read the columns from sys.columns or information_schema.columns

    For example

    DECLARE @colunns nvarchar(max)
    SET  @colunns = ''
    
    SELECT @colunns = @colunns  + '[' +  COLUMN_NAME  +'], ' 
    FROM INFORMATION_SCHEMA.columns  
    WHERE table_name = 'USERS'
    
    SET  @colunns  = left(@colunns,len(@colunns ) - 1)
    
    
    DECLARE @SQL nvarchar(max)
    SET @SQL = 'SELECT '  + @colunns 
              + 'FROM  USERS' + 'GROUP BY ' 
              + @colunns 
               + ' Having Count(*) > 1'
    
    
    exec sp_executesql @SQL
    

    Please note you should read this The Curse and Blessings of Dynamic SQL if you haven't already

    0 讨论(0)
  • 2021-01-18 18:46

    I believe that this should work for you. Keep in mind that CHECKSUM() isn't 100% perfect - it's theoretically possible to get a false positive here (I think), but otherwise you can just change the table name and this should work:

    ;WITH cte AS (
        SELECT
            *,
            CHECKSUM(*) AS chksum,
            ROW_NUMBER() OVER(ORDER BY GETDATE()) AS row_num
        FROM
            My_Table
    )
    SELECT
        *
    FROM
        CTE T1
    INNER JOIN CTE T2 ON
        T2.chksum = T1.chksum AND
        T2.row_num <> T1.row_num
    

    The ROW_NUMBER() is needed so that you have some way of distinguishing rows. It requires an ORDER BY and that can't be a constant, so GETDATE() was my workaround for that.

    Simply change the table name in the CTE and it should work without spelling out the columns.

    0 讨论(0)
  • 2021-01-18 18:50

    I recently was looking into the same issue and noticed this question. I managed to solve it using a stored procedure with some dynamic SQL. This way you only need to specify the table name. And it will get all the other relevant data from sys tables.

    /*
    This SP returns all duplicate rows (1 line for each duplicate) for any given table.
    
    to use the SP:
    exec [database].[dbo].[sp_duplicates] 
        @table = '[database].[schema].[table]'  
    
    */
    create proc dbo.sp_duplicates @table nvarchar(50) as
    
    declare @query nvarchar(max)
    declare @groupby nvarchar(max)
    
    set @groupby =  stuff((select ',' + [name]
                    FROM sys.columns
                    WHERE object_id = OBJECT_ID(@table)
                    FOR xml path('')), 1, 1, '')
    
    set @query = 'select *, count(*)
                    from '+@table+'
                    group by '+@groupby+'
                    having count(*) > 1'
    
    exec (@query)
    
    0 讨论(0)
提交回复
热议问题