Find and remove duplicate rows by two columns

前端未结

关注

 7  622

I read all the relevant duplicated questions/answers and I found this to be the most relevant answer:

INSERT IGNORE INTO temp(MAILING_ID,REPORT_ID) 
SELECT DISTI


                      
              相关标签:


      
      
        
          7条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  -上瘾入骨i        
                
              
                            
                2021-02-01 09:24
              
            
            
                                                                       
For Mysql: 

DELETE t1 FROM yourtable t1 
  INNER JOIN yourtable t2 WHERE t1.id < t2.id 
    AND t1.identField1 = t2.identField1 
    AND t1.identField2 = t2.identField2;

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  伪装坚强ぢ        
                
              
                            
                2021-02-01 09:28
              
            
            
                                                                       
The best way to delete duplicate rows by multiple columns is the simplest one: 

Add an UNIQUE index:

ALTER IGNORE TABLE your_table ADD UNIQUE (field1,field2,field3);


The IGNORE above makes sure that only the first found row is kept, the rest discarded.

(You can then drop that index if you need future duplicates and/or know they won't happen again).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  长情又很酷        
                
              
                            
                2021-02-01 09:30
              
            
            
                                                                       
You will first need to find your duplicates by grouping on the two fields with a having clause.

    Select identField1, identField2, count(*) FROM yourTable
        GROUP BY identField1, identField2
          HAVING count(*) >1


If this returns what you want, you can then use it as a subquery and 

  DELETE FROM yourTable WHERE field in (Select identField1, identField2, count(*) FROM yourTable
        GROUP BY identField1, identField2
          HAVING count(*) >1 )

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  夕颜        
                
              
                            
                2021-02-01 09:33
              
            
            
                                                                       
NOTE: This solution is an alternative & old school solution.



If you couldn't achieve what you wanted, then you can try my "oldschool" method:

First, run this query to get the duplicate records:

select   column1,
         column2,
         count(*)
from     table
group by column1,
         column2
having   count(*) > 1
order by count(*) desc


After that, select those results and paste them into the notepad++:




Now by using the find and replace specialty of the notepad++ replace them with; first "delete" then "insert" queries like this (from now on, for security reasons, my values will be AAAA).

Special Note: Please make another new line for the end of the last line of your data inside notepad++ because regex matched the '\r\n' at the end of the each line:



Find what regex: \D*(\d+)\D*(\d+)\D*\r\n

Replace with string: delete from table where column1 = $1 and column2 = $2; insert into table set column1 = $1, column2 = $2;\r\n

Now finally, paste those queries to your MySQL Workbench's query console and execute. You will see only one occurrences of each duplicate record.



This answer is for a relation table constructed of just two columns without ID. I think you can apply it to your situation.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  离开以前        
                
              
                            
                2021-02-01 09:33
              
            
            
                                                                       
In a large data set if you are selecting the multiple columns in the select clause ex:
select x,y,z from table1.
And the requirement is to remove duplicate based on two columns:from above example let y,z
then you may use below instead of using combo of "group by" and "sub query", which is bad in performance:

select x,y,z 
from (
select x,y,z , row_number() over (partition by y,z) as index_num
from table1) main
where main.index_num=1

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  南笙        
                
              
                            
                2021-02-01 09:44
              
            
            
                                                                       
This works perfectly in any version of MySQL including 5.7+.  It also handles the error You can't specify target table 'my_table' for update in FROM clause by using a double-nested subquery.  It only deletes ONE duplicate row (the later one) so if you have 3 or more duplicates, you can run the query multiple times.  It never deletes unique rows.

DELETE FROM my_table
WHERE id IN (
  SELECT calc_id FROM (
    SELECT MAX(id) AS calc_id
    FROM my_table
    GROUP BY identField1, identField2
    HAVING COUNT(id) > 1
  ) temp
)


I needed this query because I wanted to add a UNIQUE index on two columns but there were some duplicate rows that I needed to discard first.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复