Finding duplicate values in a SQL table

后端 未结 30 3996
南旧
南旧 2020-11-21 13:18

It\'s easy to find duplicates with one field:

SELECT name, COUNT(email) 
FROM users
GROUP BY email
HAVING COUNT(email) > 1

So if we have

相关标签:
30条回答
  • 2020-11-21 14:01

    try this:

    declare @YourTable table (id int, name varchar(10), email varchar(50))
    
    INSERT @YourTable VALUES (1,'John','John-email')
    INSERT @YourTable VALUES (2,'John','John-email')
    INSERT @YourTable VALUES (3,'fred','John-email')
    INSERT @YourTable VALUES (4,'fred','fred-email')
    INSERT @YourTable VALUES (5,'sam','sam-email')
    INSERT @YourTable VALUES (6,'sam','sam-email')
    
    SELECT
        name,email, COUNT(*) AS CountOf
        FROM @YourTable
        GROUP BY name,email
        HAVING COUNT(*)>1
    

    OUTPUT:

    name       email       CountOf
    ---------- ----------- -----------
    John       John-email  2
    sam        sam-email   2
    
    (2 row(s) affected)
    

    if you want the IDs of the dups use this:

    SELECT
        y.id,y.name,y.email
        FROM @YourTable y
            INNER JOIN (SELECT
                            name,email, COUNT(*) AS CountOf
                            FROM @YourTable
                            GROUP BY name,email
                            HAVING COUNT(*)>1
                        ) dt ON y.name=dt.name AND y.email=dt.email
    

    OUTPUT:

    id          name       email
    ----------- ---------- ------------
    1           John       John-email
    2           John       John-email
    5           sam        sam-email
    6           sam        sam-email
    
    (4 row(s) affected)
    

    to delete the duplicates try:

    DELETE d
        FROM @YourTable d
            INNER JOIN (SELECT
                            y.id,y.name,y.email,ROW_NUMBER() OVER(PARTITION BY y.name,y.email ORDER BY y.name,y.email,y.id) AS RowRank
                            FROM @YourTable y
                                INNER JOIN (SELECT
                                                name,email, COUNT(*) AS CountOf
                                                FROM @YourTable
                                                GROUP BY name,email
                                                HAVING COUNT(*)>1
                                            ) dt ON y.name=dt.name AND y.email=dt.email
                       ) dt2 ON d.id=dt2.id
            WHERE dt2.RowRank!=1
    SELECT * FROM @YourTable
    

    OUTPUT:

    id          name       email
    ----------- ---------- --------------
    1           John       John-email
    3           fred       John-email
    4           fred       fred-email
    5           sam        sam-email
    
    (4 row(s) affected)
    
    0 讨论(0)
  • 2020-11-21 14:01
    select id,name,COUNT(*) from user group by Id,Name having COUNT(*)>1
    
    0 讨论(0)
  • 2020-11-21 14:02
    select name, email
    , case 
    when ROW_NUMBER () over (partition by name, email order by name) > 1 then 'Yes'
    else 'No'
    end "duplicated ?"
    from users
    
    0 讨论(0)
  • 2020-11-21 14:02
    SELECT * FROM users u where rowid = (select max(rowid) from users u1 where
    u.email=u1.email);
    
    0 讨论(0)
  • 2020-11-21 14:03
    SELECT
        name, email, COUNT(*)
    FROM
        users
    GROUP BY
        name, email
    HAVING 
        COUNT(*) > 1
    

    Simply group on both of the columns.

    Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":

    In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.

    Support is not consistent:

    • Recent PostgreSQL supports it.
    • SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
    • MySQL is unpredictable and you need sql_mode=only_full_group_by:
      • GROUP BY lname ORDER BY showing wrong results;
      • Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
    • Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).
    0 讨论(0)
  • 2020-11-21 14:03
     SELECT name, email 
        FROM users
        WHERE email in
        (SELECT email FROM users
        GROUP BY email 
        HAVING COUNT(*)>1)
    
    0 讨论(0)
提交回复
热议问题