How to remove duplicate entries from a mysql db?

后端 未结 8 1296
南旧
南旧 2020-12-02 10:41

I have a table with some ids + titles. I want to make the title column unique, but it has over 600k records already, some of which are duplicates (sometimes several dozen ti

相关标签:
8条回答
  • 2020-12-02 10:57

    Create a new table with just the distinct rows of the original table. There may be other ways but I find this the cleanest.

    CREATE TABLE tmp_table AS SELECT DISTINCT [....] FROM main_table
    

    More specifically:
    The faster way is to insert distinct rows into a temporary table. Using delete, it took me a few hours to remove duplicates from a table of 8 million rows. Using insert and distinct, it took just 13 minutes.

    CREATE TABLE tempTableName LIKE tableName;  
    CREATE INDEX ix_all_id ON tableName(cellId,attributeId,entityRowId,value);  
    INSERT INTO tempTableName(cellId,attributeId,entityRowId,value) SELECT DISTINCT cellId,attributeId,entityRowId,value FROM tableName;  
    DROP TABLE tableName;  
    INSERT tableName SELECT * FROM tempTableName;  
    DROP TABLE tempTableName;  
    
    0 讨论(0)
  • 2020-12-02 10:57

    Since the MySql ALTER IGNORE TABLE has been deprecated, you need to actually delete the duplicate date before adding an index.

    First write a query that finds all the duplicates. Here I'm assuming that email is the field that contains duplicates.

    SELECT
        s1.email
        s1.id, 
        s1.created
        s2.id,
        s2.created 
    FROM 
        student AS s1 
    INNER JOIN 
        student AS s2 
    WHERE 
        /* Emails are the same */
        s1.email = s2.email AND
        /* DON'T select both accounts,
           only select the one created later.
           The serial id could also be used here */
        s2.created > s1.created 
    ;
    

    Next select only the unique duplicate ids:

    SELECT 
        DISTINCT s2.id
    FROM 
        student AS s1 
    INNER JOIN 
        student AS s2 
    WHERE 
        s1.email = s2.email AND
        s2.created > s1.created 
    ;
    

    Once you are sure that only contains the duplicate ids you want to delete, run the delete. You have to add (SELECT * FROM tblname) so that MySql doesn't complain.

    DELETE FROM
        student 
    WHERE
        id
    IN (
        SELECT 
            DISTINCT s2.id
        FROM 
            (SELECT * FROM student) AS s1 
        INNER JOIN 
            (SELECT * FROM student) AS s2 
        WHERE 
            s1.email = s2.email AND
            s2.created > s1.created 
    );
    

    Then create the unique index:

    ALTER TABLE
        student
    ADD UNIQUE INDEX
        idx_student_unique_email(email)
    ;
    
    0 讨论(0)
  • 2020-12-02 10:58
    delete from student where id in (
    SELECT distinct(s1.`student_id`) from student as s1 inner join student as s2
    where s1.`sex` = s2.`sex` and
    s1.`student_id` > s2.`student_id` and
    s1.`sex` = 'M'
        ORDER BY `s1`.`student_id` ASC
    )
    
    0 讨论(0)
  • 2020-12-02 11:00

    The solution posted by Nitin seems to be the most elegant / logical one.

    However it has one issue:

    ERROR 1093 (HY000): You can't specify target table 'student' for update in FROM clause

    This can however be resolved by using (SELECT * FROM student) instead of student:

    DELETE FROM student WHERE id IN (
    SELECT distinct(s1.`student_id`) FROM (SELECT * FROM student) AS s1 INNER JOIN (SELECT * FROM student) AS s2
    WHERE s1.`sex` = s2.`sex` AND
    s1.`student_id` > s2.`student_id` AND
    s1.`sex` = 'M'
    ORDER BY `s1`.`student_id` ASC
    )
    

    Give your +1's to Nitin for coming up with the original solution.

    0 讨论(0)
  • 2020-12-02 11:04

    This command adds a unique key, and drops all rows that generate errors (due to the unique key). This removes duplicates.

    ALTER IGNORE TABLE table ADD UNIQUE KEY idx1(title); 
    

    Edit: Note that this command may not work for InnoDB tables for some versions of MySQL. See this post for a workaround. (Thanks to "an anonymous user" for this information.)

    0 讨论(0)
  • 2020-12-02 11:05

    This shows how to do it in SQL2000. I'm not completely familiar with MySQL syntax but I'm sure there's something comparable

    create table #titles (iid int identity (1, 1), title varchar(200))
    
    -- Repeat this step many times to create duplicates
    insert into #titles(title) values ('bob')
    insert into #titles(title) values ('bob1')
    insert into #titles(title) values ('bob2')
    insert into #titles(title) values ('bob3')
    insert into #titles(title) values ('bob4')
    
    
    DELETE T  FROM 
    #titles T left join 
    (
      select title, min(iid) as minid from #titles group by title
    ) D on T.title = D.title and T.iid = D.minid
    WHERE D.minid is null
    
    Select * FROM #titles
    
    0 讨论(0)
提交回复
热议问题