I need to DELETE
duplicated rows for specified sid on a MySQL
table.
How can I do this with an SQL query?
This works for large tables:
CREATE Temporary table duplicates AS select max(id) as id, url from links group by url having count(*) > 1;
DELETE l from links l inner join duplicates ld on ld.id = l.id WHERE ld.id IS NOT NULL;
To delete oldest change max(id)
to min(id)
The following works for all tables
CREATE TABLE `noDup` LIKE `Dup` ;
INSERT `noDup` SELECT DISTINCT * FROM `Dup` ;
DROP TABLE `Dup` ;
ALTER TABLE `noDup` RENAME `Dup` ;
I find Werner's solution above to be the most convenient because it works regardless of the presence of a primary key, doesn't mess with tables, uses future-proof plain sql, is very understandable.
As I stated in my comment, that solution hasn't been properly explained though. So this is mine, based on it.
1) add a new boolean column
alter table mytable add tokeep boolean;
2) add a constraint on the duplicated columns AND the new column
alter table mytable add constraint preventdupe unique (mycol1, mycol2, tokeep);
3) set the boolean column to true. This will succeed only on one of the duplicated rows because of the new constraint
update ignore mytable set tokeep = true;
4) delete rows that have not been marked as tokeep
delete from mytable where tokeep is null;
5) drop the added column
alter table mytable drop tokeep;
I suggest that you keep the constraint you added, so that new duplicates are prevented in the future.
Love @eric's answer but it doesn't seem to work if you have a really big table (I'm getting The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET MAX_JOIN_SIZE=# if the SELECT is okay
when I try to run it). So I limited the join query to only consider the duplicate rows and I ended up with:
DELETE a FROM penguins a
LEFT JOIN (SELECT COUNT(baz) AS num, MIN(baz) AS keepBaz, foo
FROM penguins
GROUP BY deviceId HAVING num > 1) b
ON a.baz != b.keepBaz
AND a.foo = b.foo
WHERE b.foo IS NOT NULL
The WHERE clause in this case allows MySQL to ignore any row that doesn't have a duplicate and will also ignore if this is the first instance of the duplicate so only subsequent duplicates will be ignored. Change MIN(baz)
to MAX(baz)
to keep the last instance instead of the first.
There are just a few basic steps when removing duplicate data from your table:
Here is the full tutorial: https://blog.teamsql.io/deleting-duplicate-data-3541485b3473
I think this will work by basically copying the table and emptying it then putting only the distinct values back into it but please double check it before doing it on large amounts of data.
Creates a carbon copy of your table
create table temp_table like oldtablename; insert temp_table select * from oldtablename;
Empties your original table
DELETE * from oldtablename;
Copies all distinct values from the copied table back to your original table
INSERT oldtablename SELECT * from temp_table group by firstname,lastname,dob
Deletes your temp table.
Drop Table temp_table
You need to group by aLL fields that you want to keep distinct.