How to delete duplicates on a MySQL table?

后端未结

关注

 25  2426

I need to DELETE duplicated rows for specified sid on a MySQL table.

How can I do this with an SQL query?


                      
              相关标签:


      
      
        
          25条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  你的背包        
                
              
                            
                2020-11-22 02:05
              
            
            
                                                                       
This works for large tables:

 CREATE Temporary table duplicates AS select max(id) as id, url from links group by url having count(*) > 1;

 DELETE l from links l inner join duplicates ld on ld.id = l.id WHERE ld.id IS NOT NULL;


To delete oldest change max(id) to min(id)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  渐次进展        
                
              
                            
                2020-11-22 02:06
              
            
            
                                                                       
The following works for all tables

CREATE TABLE `noDup` LIKE `Dup` ;
INSERT `noDup` SELECT DISTINCT * FROM `Dup` ;
DROP TABLE `Dup` ;
ALTER TABLE `noDup` RENAME `Dup` ;

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  挽巷        
                
              
                            
                2020-11-22 02:06
              
            
            
                                                                       
I find Werner's solution above to be the most convenient because it works regardless of the presence of a primary key, doesn't mess with tables, uses future-proof plain sql, is very understandable.

As I stated in my comment, that solution hasn't been properly explained though.
So this is mine, based on it.

1) add a new boolean column

alter table mytable add tokeep boolean;


2) add a constraint on the duplicated columns AND the new column

alter table mytable add constraint preventdupe unique (mycol1, mycol2, tokeep);


3) set the boolean column to true. This will succeed only on one of the duplicated rows because of the new constraint

update ignore mytable set tokeep = true;


4) delete rows that have not been marked as tokeep

delete from mytable where tokeep is null;


5) drop the added column

alter table mytable drop tokeep;


I suggest that you keep the constraint you added, so that new duplicates are prevented in the future.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  旧时难觅i        
                
              
                            
                2020-11-22 02:06
              
            
            
                                                                       
Love @eric's answer but it doesn't seem to work if you have a really big table (I'm getting The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET MAX_JOIN_SIZE=# if the SELECT is okay when I try to run it).  So I limited the join query to only consider the duplicate rows and I ended up with:

DELETE a FROM penguins a
    LEFT JOIN (SELECT COUNT(baz) AS num, MIN(baz) AS keepBaz, foo
        FROM penguins
        GROUP BY deviceId HAVING num > 1) b
        ON a.baz != b.keepBaz
        AND a.foo = b.foo
    WHERE b.foo IS NOT NULL


The WHERE clause in this case allows MySQL to ignore any row that doesn't have a duplicate and will also ignore if this is the first instance of the duplicate so only subsequent duplicates will be ignored.  Change MIN(baz) to MAX(baz) to keep the last instance instead of the first.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  谎友^        
                
              
                            
                2020-11-22 02:06
              
            
            
                                                                       
There are just a few basic steps when removing duplicate data from your table:


Back up your table!
Find the duplicate rows
Remove the duplicate rows


Here is the full tutorial: https://blog.teamsql.io/deleting-duplicate-data-3541485b3473
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  刺人心        
                
              
                            
                2020-11-22 02:08
              
            
            
                                                                       
I think this will work by  basically copying the table and emptying it then putting only the distinct values back into it but please double check it before doing it on large amounts of data.

Creates a carbon copy of your table


  create table temp_table like oldtablename;
  insert temp_table select * from oldtablename;


Empties your original table


  DELETE * from oldtablename;


Copies all distinct values from the copied table back to your original table


  INSERT oldtablename SELECT * from temp_table group by firstname,lastname,dob


Deletes your temp table.


  Drop Table temp_table


You need to group by aLL fields that you want to keep distinct.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     上一页
1
2
3
4
5
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复