How to remove duplicates from table using SQL query

后端 未结 7 1572
遇见更好的自我
遇见更好的自我 2020-12-17 04:34

I have a table which is as follows:

emp_name   emp_address  sex  matial_status  
uuuu       eee          m    s
iiii       iii          f    s
uuuu       eee         


        
相关标签:
7条回答
  • 2020-12-17 05:25

    If you are okay with trading space for performance and simplicity then the duplicates in emp_name | emp_address | sex combo can be eliminated, by the introduction of a calculated/derived column using CHECKSUM() TSQL method and DISTINCT keyword while querying.

    Heres an example of CHECKSUM :

    SELECT CHECKSUM(*) FROM HumanResources.Employee WHERE EmployeeID = 2
    

    Google around and create a dependent column that contains the checksum of the 3 columns. Then you can select distinct rows by looking at this question

    0 讨论(0)
  • 2020-12-17 05:28

    This is not a query but a delete statement. It will delete/remove duplicate rows from your table

    ;with C as
    (
      select row_number() over(partition by DUPLICATE_VAARS_DECISION 
                               order by NODE_EQ_NO) as rn
      from yourtable
    )
    delete C
    where rn > 1
    

    If you are only interested in querying the table and get the non duplicates as a result you should use this instead.

    ;with C as
    (
      select *,
             row_number() over(partition by DUPLICATE_VAARS_DECISION 
                               order by NODE_EQ_NO) as rn
      from yourtable
    )
    select *
    from C
    where rn = 1
    
    0 讨论(0)
  • 2020-12-17 05:34

    It looks like all four column values are duplicated so you can do this -

    select distinct emp_name, emp_address, sex, marital_status
    from YourTable
    

    However if marital status can be different and you have some other column based on which to choose (for eg you want latest record based on a column create_date) you can do this

    select emp_name, emp_address, sex, marital_status
    from YourTable a
    where not exists (select 1 
                       from YourTable b
                      where b.emp_name = a.emp_name and
                            b.emp_address = a.emp_address and
                            b.sex = a.sex and
                            b.create_date >= a.create_date)
    
    0 讨论(0)
  • 2020-12-17 05:36

    one way

    select emp_name,   emp_address,  sex,  max(marital_status) as marital_status
    from Yourtable
    group by emp_name,   emp_address,  sex
    

    Since I don't know what you want, I used max for the marital status

    See also Including an Aggregated Column's Related Values for more examples

    0 讨论(0)
  • 2020-12-17 05:37

    I would create a new table with a unique index over the columns that you want to keep unique. Then do an insert from the old table into the new, ignoring the warnings about duplicated rows. Lastly, I would drop (or rename) the old table and replace it with the new table. In MySQL, this would look like

    CREATE TABLE tmp LIKE mytable;
    ALTER TABLE tmp ADD UNIQUE INDEX myindex (emp_name, emp_address, sex, marital_status);
    INSERT IGNORE INTO tmp SELECT * FROM mytable;
    DROP TABLE mytable;
    RENAME TABLE tmp TO mytable;
    

    Or something similar (this is totally untested).

    0 讨论(0)
  • 2020-12-17 05:38

    The best answer is here:
    Use this SQL statement to identify the extra duplicated rows:

     select * from Employee a 
    where %%physloc%% >
    (select min(%%physloc%%) from Employee b
    where a.emp_name=b.emp_name and a.emp_address=b.emp_address and a.sex=b.sex);

    you will get the extra row:

    uuuu   eee m   s 


    Use this SQL statement to delete the extra duplicated rows:

     delete from Employee a 
    where %%physloc%% >
    (select min(%%physloc%%) from Employee b
    where a.emp_name=b.emp_name and a.emp_address=b.emp_address and a.sex=b.sex);


    For all duplicated records, only the one with lowest physical location is kept. This method can be applied to remove all kinds of duplicated rows.

    I am assuming that you use MS SQL Server. If you are using Oracle DB, then you can just replace '%%physloc%%' with 'rowid'

    Enjoy the code!

    0 讨论(0)
提交回复
热议问题