Find and remove duplicate rows by two columns

前端 未结 7 622
鱼传尺愫
鱼传尺愫 2021-02-01 08:52

I read all the relevant duplicated questions/answers and I found this to be the most relevant answer:

INSERT IGNORE INTO temp(MAILING_ID,REPORT_ID) 
SELECT DISTI         


        
相关标签:
7条回答
  • 2021-02-01 09:24

    For Mysql:

    DELETE t1 FROM yourtable t1 
      INNER JOIN yourtable t2 WHERE t1.id < t2.id 
        AND t1.identField1 = t2.identField1 
        AND t1.identField2 = t2.identField2;
    
    0 讨论(0)
  • 2021-02-01 09:28

    The best way to delete duplicate rows by multiple columns is the simplest one:

    Add an UNIQUE index:

    ALTER IGNORE TABLE your_table ADD UNIQUE (field1,field2,field3);
    

    The IGNORE above makes sure that only the first found row is kept, the rest discarded.

    (You can then drop that index if you need future duplicates and/or know they won't happen again).

    0 讨论(0)
  • 2021-02-01 09:30

    You will first need to find your duplicates by grouping on the two fields with a having clause.

        Select identField1, identField2, count(*) FROM yourTable
            GROUP BY identField1, identField2
              HAVING count(*) >1
    

    If this returns what you want, you can then use it as a subquery and

      DELETE FROM yourTable WHERE field in (Select identField1, identField2, count(*) FROM yourTable
            GROUP BY identField1, identField2
              HAVING count(*) >1 )
    
    0 讨论(0)
  • 2021-02-01 09:33

    NOTE: This solution is an alternative & old school solution.


    If you couldn't achieve what you wanted, then you can try my "oldschool" method:

    First, run this query to get the duplicate records:

    select   column1,
             column2,
             count(*)
    from     table
    group by column1,
             column2
    having   count(*) > 1
    order by count(*) desc
    

    After that, select those results and paste them into the notepad++:

    Now by using the find and replace specialty of the notepad++ replace them with; first "delete" then "insert" queries like this (from now on, for security reasons, my values will be AAAA).

    Special Note: Please make another new line for the end of the last line of your data inside notepad++ because regex matched the '\r\n' at the end of the each line:

    Find what regex: \D*(\d+)\D*(\d+)\D*\r\n

    Replace with string: delete from table where column1 = $1 and column2 = $2; insert into table set column1 = $1, column2 = $2;\r\n

    Now finally, paste those queries to your MySQL Workbench's query console and execute. You will see only one occurrences of each duplicate record.

    This answer is for a relation table constructed of just two columns without ID. I think you can apply it to your situation.

    0 讨论(0)
  • 2021-02-01 09:33

    In a large data set if you are selecting the multiple columns in the select clause ex: select x,y,z from table1. And the requirement is to remove duplicate based on two columns:from above example let y,z then you may use below instead of using combo of "group by" and "sub query", which is bad in performance:

    select x,y,z 
    from (
    select x,y,z , row_number() over (partition by y,z) as index_num
    from table1) main
    where main.index_num=1
    
    0 讨论(0)
  • 2021-02-01 09:44

    This works perfectly in any version of MySQL including 5.7+. It also handles the error You can't specify target table 'my_table' for update in FROM clause by using a double-nested subquery. It only deletes ONE duplicate row (the later one) so if you have 3 or more duplicates, you can run the query multiple times. It never deletes unique rows.

    DELETE FROM my_table
    WHERE id IN (
      SELECT calc_id FROM (
        SELECT MAX(id) AS calc_id
        FROM my_table
        GROUP BY identField1, identField2
        HAVING COUNT(id) > 1
      ) temp
    )
    

    I needed this query because I wanted to add a UNIQUE index on two columns but there were some duplicate rows that I needed to discard first.

    0 讨论(0)
提交回复
热议问题