问题
I read all the relevant duplicated questions/answers and I found this to be the most relevant answer:
INSERT IGNORE INTO temp(MAILING_ID,REPORT_ID)
SELECT DISTINCT MAILING_ID,REPORT_IDFROM table_1
;
The problem is that I want to remove duplicates by col1 and col2, but also want to include to the insert all the other fields of table_1.
I tried to add all the relevant columns this way:
INSERT IGNORE INTO temp(M_ID,MAILING_ID,REPORT_ID,
MAILING_NAME,VISIBILITY,EXPORTED) SELECT DISTINCT
M_ID,MAILING_ID,REPORT_ID,MAILING_NAME,VISIBILITY,
EXPORTED FROM table_1
;
M_ID(int,primary),MAILING_ID(int),REPORT_ID(int),
MAILING_NAME(varchar),VISIBILITY(varchar),EXPORTED(int)
But it inserted all rows into temp (including duplicates)
回答1:
The best way to delete duplicate rows by multiple columns is the simplest one:
Add an UNIQUE index:
ALTER IGNORE TABLE your_table ADD UNIQUE (field1,field2,field3);
The IGNORE above makes sure that only the first found row is kept, the rest discarded.
(You can then drop that index if you need future duplicates and/or know they won't happen again).
回答2:
This works perfectly in any version of MySQL including 5.7+. It also handles the error You can't specify target table 'my_table' for update in FROM clause
by using a double-nested subquery. It only deletes ONE duplicate row (the later one) so if you have 3 or more duplicates, you can run the query multiple times. It never deletes unique rows.
DELETE FROM my_table
WHERE id IN (
SELECT calc_id FROM (
SELECT MAX(id) AS calc_id
FROM my_table
GROUP BY identField1, identField2
HAVING COUNT(id) > 1
) temp
)
I needed this query because I wanted to add a UNIQUE index on two columns but there were some duplicate rows that I needed to discard first.
回答3:
For Mysql:
DELETE t1 FROM yourtable t1
INNER JOIN yourtable t2 WHERE t1.id < t2.id
AND t1.identField1 = t2.identField1
AND t1.identField2 = t2.identField2;
回答4:
You will first need to find your duplicates by grouping on the two fields with a having clause.
Select identField1, identField2, count(*) FROM yourTable
GROUP BY identField1, identField2
HAVING count(*) >1
If this returns what you want, you can then use it as a subquery and
DELETE FROM yourTable WHERE field in (Select identField1, identField2, count(*) FROM yourTable
GROUP BY identField1, identField2
HAVING count(*) >1 )
回答5:
NOTE: This solution is an alternative & old school solution.
If you couldn't achieve what you wanted, then you can try my "oldschool" method:
First, run this query to get the duplicate records:
select column1,
column2,
count(*)
from table
group by column1,
column2
having count(*) > 1
order by count(*) desc
After that, select those results and paste them into the notepad++:
Now by using the find and replace specialty of the notepad++ replace them with; first "delete" then "insert" queries like this (from now on, for security reasons, my values will be AAAA).
Special Note: Please make another new line for the end of the last line of your data inside notepad++ because regex matched the '\r\n' at the end of the each line:
Find what regex: \D*(\d+)\D*(\d+)\D*\r\n
Replace with string: delete from table where column1 = $1 and column2 = $2; insert into table set column1 = $1, column2 = $2;\r\n
Now finally, paste those queries to your MySQL Workbench's query console and execute. You will see only one occurrences of each duplicate record.
This answer is for a relation table constructed of just two columns without ID. I think you can apply it to your situation.
回答6:
you can always get the primary ids by grouping that two unique fields
select count(*), id as count from table group by col a, col b having count(*)>1;
and then
delete from table where id in ( select count(*), id as count from table group by col a, col b having count(*)>1) limit maxlimit;
you can also use max()
in place of limit
回答7:
In a large data set if you are selecting the multiple columns in the select clause ex:
select x,y,z from table1
.
And the requirement is to remove duplicate based on two columns:from above example let y,z
then you may use below instead of using combo of "group by" and "sub query", which is bad in performance:
select x,y,z
from (
select x,y,z , row_number() over (partition by y,z) as index_num
from table1) main
where main.index_num=1
来源:https://stackoverflow.com/questions/14340820/find-and-remove-duplicate-rows-by-two-columns