Finding duplicate values in a SQL table

后端 未结 30 4042
南旧
南旧 2020-11-21 13:18

It\'s easy to find duplicates with one field:

SELECT name, COUNT(email) 
FROM users
GROUP BY email
HAVING COUNT(email) > 1

So if we have

相关标签:
30条回答
  • 2020-11-21 13:53

    If you want to delete the duplicates, here's a much simpler way to do it than having to find even/odd rows into a triple sub-select:

    SELECT id, name, email 
    FROM users u, users u2
    WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
    

    And so to delete:

    DELETE FROM users
    WHERE id IN (
        SELECT id/*, name, email*/
        FROM users u, users u2
        WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
    )
    

    Much more easier to read and understand IMHO

    Note: The only issue is that you have to execute the request until there is no rows deleted, since you delete only 1 of each duplicate each time

    0 讨论(0)
  • 2020-11-21 13:53
    SELECT name, email,COUNT(email) 
    FROM users 
    WHERE email IN (
        SELECT email 
        FROM users 
        GROUP BY email 
        HAVING COUNT(email) > 1)
    
    0 讨论(0)
  • 2020-11-21 13:53

    SELECT column_name,COUNT(*) FROM TABLE_NAME GROUP BY column1, HAVING COUNT(*) > 1;

    0 讨论(0)
  • 2020-11-21 13:55
     select emp.ename, emp.empno, dept.loc 
              from emp
     inner join dept 
              on dept.deptno=emp.deptno
     inner join
        (select ename, count(*) from
        emp
        group by ename, deptno
        having count(*) > 1)
     t on emp.ename=t.ename order by emp.ename
    /
    
    0 讨论(0)
  • 2020-11-21 13:58

    You may want to try this

    SELECT NAME, EMAIL, COUNT(*)
    FROM USERS
    GROUP BY 1,2
    HAVING COUNT(*) > 1
    
    0 讨论(0)
  • 2020-11-21 13:59

    The most important thing here is to have the fastest function. Also indices of duplicates should be identified. Self join is a good option but to have a faster function it is better to first find rows that have duplicates and then join with original table for finding id of duplicated rows. Finally order by any column except id to have duplicated rows near each other.

    SELECT u.*
    FROM users AS u
    JOIN (SELECT username, email
          FROM users
          GROUP BY username, email
          HAVING COUNT(*)>1) AS w
    ON u.username=w.username AND u.email=w.email
    ORDER BY u.email;
    
    0 讨论(0)
提交回复
热议问题