How to delete duplicates on a MySQL table?

后端 未结 25 2304
遇见更好的自我
遇见更好的自我 2020-11-22 01:35

I need to DELETE duplicated rows for specified sid on a MySQL table.

How can I do this with an SQL query?

         


        
相关标签:
25条回答
  • 2020-11-22 02:05

    This works for large tables:

     CREATE Temporary table duplicates AS select max(id) as id, url from links group by url having count(*) > 1;
    
     DELETE l from links l inner join duplicates ld on ld.id = l.id WHERE ld.id IS NOT NULL;
    

    To delete oldest change max(id) to min(id)

    0 讨论(0)
  • 2020-11-22 02:06

    The following works for all tables

    CREATE TABLE `noDup` LIKE `Dup` ;
    INSERT `noDup` SELECT DISTINCT * FROM `Dup` ;
    DROP TABLE `Dup` ;
    ALTER TABLE `noDup` RENAME `Dup` ;
    
    0 讨论(0)
  • 2020-11-22 02:06

    I find Werner's solution above to be the most convenient because it works regardless of the presence of a primary key, doesn't mess with tables, uses future-proof plain sql, is very understandable.

    As I stated in my comment, that solution hasn't been properly explained though. So this is mine, based on it.

    1) add a new boolean column

    alter table mytable add tokeep boolean;
    

    2) add a constraint on the duplicated columns AND the new column

    alter table mytable add constraint preventdupe unique (mycol1, mycol2, tokeep);
    

    3) set the boolean column to true. This will succeed only on one of the duplicated rows because of the new constraint

    update ignore mytable set tokeep = true;
    

    4) delete rows that have not been marked as tokeep

    delete from mytable where tokeep is null;
    

    5) drop the added column

    alter table mytable drop tokeep;
    

    I suggest that you keep the constraint you added, so that new duplicates are prevented in the future.

    0 讨论(0)
  • 2020-11-22 02:06

    Love @eric's answer but it doesn't seem to work if you have a really big table (I'm getting The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET MAX_JOIN_SIZE=# if the SELECT is okay when I try to run it). So I limited the join query to only consider the duplicate rows and I ended up with:

    DELETE a FROM penguins a
        LEFT JOIN (SELECT COUNT(baz) AS num, MIN(baz) AS keepBaz, foo
            FROM penguins
            GROUP BY deviceId HAVING num > 1) b
            ON a.baz != b.keepBaz
            AND a.foo = b.foo
        WHERE b.foo IS NOT NULL
    

    The WHERE clause in this case allows MySQL to ignore any row that doesn't have a duplicate and will also ignore if this is the first instance of the duplicate so only subsequent duplicates will be ignored. Change MIN(baz) to MAX(baz) to keep the last instance instead of the first.

    0 讨论(0)
  • 2020-11-22 02:06

    There are just a few basic steps when removing duplicate data from your table:

    • Back up your table!
    • Find the duplicate rows
    • Remove the duplicate rows

    Here is the full tutorial: https://blog.teamsql.io/deleting-duplicate-data-3541485b3473

    0 讨论(0)
  • 2020-11-22 02:08

    I think this will work by basically copying the table and emptying it then putting only the distinct values back into it but please double check it before doing it on large amounts of data.

    Creates a carbon copy of your table

    create table temp_table like oldtablename; insert temp_table select * from oldtablename;

    Empties your original table

    DELETE * from oldtablename;

    Copies all distinct values from the copied table back to your original table

    INSERT oldtablename SELECT * from temp_table group by firstname,lastname,dob

    Deletes your temp table.

    Drop Table temp_table

    You need to group by aLL fields that you want to keep distinct.

    0 讨论(0)
提交回复
热议问题