How to delete duplicates on a MySQL table?

后端未结

关注

 25  2423

I need to DELETE duplicated rows for specified sid on a MySQL table.

How can I do this with an SQL query?


                      
              相关标签:


      
      
        
          25条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  慢半拍i        
                
              
                            
                2020-11-22 01:57
              
            
            
                                                                       
Here is a simple answer:

delete a from target_table a left JOIN (select max(id_field) as id, field_being_repeated  
    from target_table GROUP BY field_being_repeated) b 
    on a.field_being_repeated = b.field_being_repeated
      and a.id_field = b.id_field
    where b.id_field is null;

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一生所求        
                
              
                            
                2020-11-22 01:57
              
            
            
                                                                       
DELETE T2
FROM   table_name T1
JOIN   same_table_name T2 ON (T1.title = T2.title AND T1.ID <> T2.ID)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  [愿得一人]        
                
              
                            
                2020-11-22 02:02
              
            
            
                                                                       
this removes duplicates in place, without making a new table

ALTER IGNORE TABLE `table_name` ADD UNIQUE (title, SID)


note: only works well if index fits in memory
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  长情又很酷        
                
              
                            
                2020-11-22 02:02
              
            
            
                                                                       
Following remove duplicates for all SID-s, not only single one.

With temp table

CREATE TABLE table_temp AS
SELECT * FROM table GROUP BY title, SID;

DROP TABLE table;
RENAME TABLE table_temp TO table;


Since temp_table is freshly created it has no indexes. You'll need to recreate them after removing duplicates. You can check what indexes you have in the table with SHOW INDEXES IN table

Without temp table:

DELETE FROM `table` WHERE id IN (
  SELECT all_duplicates.id FROM (
    SELECT id FROM `table` WHERE (`title`, `SID`) IN (
      SELECT `title`, `SID` FROM `table` GROUP BY `title`, `SID` having count(*) > 1
    )
  ) AS all_duplicates 
  LEFT JOIN (
    SELECT id FROM `table` GROUP BY `title`, `SID` having count(*) > 1
  ) AS grouped_duplicates 
  ON all_duplicates.id = grouped_duplicates.id 
  WHERE grouped_duplicates.id IS NULL
)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  予麋鹿        
                
              
                            
                2020-11-22 02:03
              
            
            
                                                                       
This always seems to work for me:

CREATE TABLE NoDupeTable LIKE DupeTable; 
INSERT NoDupeTable SELECT * FROM DupeTable group by CommonField1,CommonFieldN;


Which keeps the lowest ID on each of the dupes and the rest of the non-dupe records.

I've also taken to doing the following so that the dupe issue no longer occurs after the removal:

CREATE TABLE NoDupeTable LIKE DupeTable; 
Alter table NoDupeTable Add Unique `Unique` (CommonField1,CommonField2);
INSERT IGNORE NoDupeTable SELECT * FROM DupeTable;


In other words, I create a duplicate of the first table, add a unique index on the fields I don't want duplicates of, and then do an Insert IGNORE which has the advantage of not failing as a normal Insert would the first time it tried to add a duplicate record based on the two fields and rather ignores any such records.

Moving fwd it becomes impossible to create any duplicate records based on those two fields.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  半阙折子戏        
                
              
                            
                2020-11-22 02:04
              
            
            
                                                                       
here is how I usually eliminate duplicates


add a temporary column, name it whatever you want(i'll refer as active)
group by the fields that you think shouldn't be duplicate and set their active to 1, grouping by will select only one of duplicate values(will not select duplicates)for that columns
delete the ones with active zero
drop column active
optionally(if fits to your purposes), add unique index for those columns to not have duplicates again

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     上一页
1
2
3
4
5
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复