How to remove duplicates from table using SQL query

后端未结

关注

 7  1572

I have a table which is as follows:

emp_name   emp_address  sex  matial_status  
uuuu       eee          m    s
iiii       iii          f    s
uuuu       eee


                      
              相关标签:


      
      
        
          7条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  你的背包        
                
              
                            
                2020-12-17 05:25
              
            
            
                                                                       
If you are okay with trading space for performance and simplicity then the duplicates in emp_name | emp_address | sex combo can be eliminated, by the introduction of a calculated/derived column using CHECKSUM() TSQL method and DISTINCT keyword while querying.

Heres an example of CHECKSUM :

SELECT CHECKSUM(*) FROM HumanResources.Employee WHERE EmployeeID = 2


Google around and create a dependent column that contains the checksum of the 3 columns.
Then you can select distinct rows by looking at this question
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  时光说笑        
                
              
                            
                2020-12-17 05:28
              
            
            
                                                                       
This is not a query but a delete statement. It will delete/remove duplicate rows from your table

;with C as
(
  select row_number() over(partition by DUPLICATE_VAARS_DECISION 
                           order by NODE_EQ_NO) as rn
  from yourtable
)
delete C
where rn > 1


If you are only interested in querying the table and get the non duplicates as a result you should use this instead.

;with C as
(
  select *,
         row_number() over(partition by DUPLICATE_VAARS_DECISION 
                           order by NODE_EQ_NO) as rn
  from yourtable
)
select *
from C
where rn = 1

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  余生分开走        
                
              
                            
                2020-12-17 05:34
              
            
            
                                                                       
It looks like all four column values are duplicated so you can do this - 

select distinct emp_name, emp_address, sex, marital_status
from YourTable


However if marital status can be different and you have some other column based on which to choose (for eg you want latest record based on a column create_date) you can do this

select emp_name, emp_address, sex, marital_status
from YourTable a
where not exists (select 1 
                   from YourTable b
                  where b.emp_name = a.emp_name and
                        b.emp_address = a.emp_address and
                        b.sex = a.sex and
                        b.create_date >= a.create_date)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  渐次进展        
                
              
                            
                2020-12-17 05:36
              
            
            
                                                                       
one way

select emp_name,   emp_address,  sex,  max(marital_status) as marital_status
from Yourtable
group by emp_name,   emp_address,  sex


Since I don't know what you want, I used max for the marital status

See also Including an Aggregated Column's Related Values for more examples
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轮回少年        
                
              
                            
                2020-12-17 05:37
              
            
            
                                                                       
I would create a new table with a unique index over the columns that you want to keep unique. Then do an insert from the old table into the new, ignoring the warnings about duplicated rows. Lastly, I would drop (or rename) the old table and replace it with the new table. In MySQL, this would look like

CREATE TABLE tmp LIKE mytable;
ALTER TABLE tmp ADD UNIQUE INDEX myindex (emp_name, emp_address, sex, marital_status);
INSERT IGNORE INTO tmp SELECT * FROM mytable;
DROP TABLE mytable;
RENAME TABLE tmp TO mytable;


Or something similar (this is totally untested).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  难免孤独        
                
              
                            
                2020-12-17 05:38
              
            
            
                                                                       
The best answer is here:

Use this SQL statement to identify the extra duplicated rows:


 select * from Employee a 

    where %%physloc%% > 

        (select min(%%physloc%%) from Employee b 

            where a.emp_name=b.emp_name and a.emp_address=b.emp_address and a.sex=b.sex); 


you will get the extra row:


uuuu   eee m   s         



Use this SQL statement to delete the extra duplicated rows:

 delete from Employee a 

    where %%physloc%% > 

        (select min(%%physloc%%) from Employee b 

            where a.emp_name=b.emp_name and a.emp_address=b.emp_address and a.sex=b.sex); 
 



For all duplicated records, only the one with lowest physical location is kept. This method can be applied to remove all kinds of duplicated rows.


I am assuming that you use MS SQL Server. If you are using Oracle DB, then you can just replace '%%physloc%%' with 'rowid'

Enjoy the code!


                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复