Deleting duplicates from a large table

前端 未结 5 940
没有蜡笔的小新
没有蜡笔的小新 2021-01-02 02:40

I have quite a large table with 19 000 000 records, and I have problem with duplicate rows. There\'s a lot of similar questions even here in SO, but none of them seems to gi

相关标签:
5条回答
  • 2021-01-02 02:50

    This query works perfectly for every case : tested for Engine : MyIsam for 2 million rows.

    ALTER IGNORE TABLE table_name ADD UNIQUE (location_id, datetime)

    0 讨论(0)
  • 2021-01-02 02:54

    You can delete duplicates using these steps: 1- Export the following query's results into a txt file:

    select dup_col from table1 group by dup_col having count(dup_col) > 1
    

    2- Add this to the first of above txt file and run the final query:

    delete from table1 where dup_col in (.....)
    

    Please note that '...' is the contents of txt file created in the first step.

    0 讨论(0)
  • 2021-01-02 02:55
    SELECT *, COUNT(*) AS Count
    FROM table
    GROUP BY location_id, datetime
    HAVING Count > 2
    
    0 讨论(0)
  • 2021-01-02 03:01
    UPDATE table SET datetime  = null 
    WHERE location_id IN (
    SELECT location_id 
    FROM table as tableBis
    WHERE tableBis.location_id = table.location_id
    AND table.datetime > tableBis.datetime)
    
    SELECT * INTO tableCopyWithNoDuplicate FROM table WHERE datetime is not null
    
    DROp TABLE table 
    
    RENAME tableCopyWithNoDuplicate to table
    

    So you keep the line with the lower datetime. I'm not sure about perf, it depends on your table column, your server etc...

    0 讨论(0)
  • 2021-01-02 03:05

    I think you can use this query to delete the duplicate records from the table

    ALTER IGNORE TABLE table_name ADD UNIQUE (location_id, datetime)
    

    Before doing this, just test with some sample data first..and then Try this....

    Note: On version 5.5, it works on MyISAM but not InnoDB.

    0 讨论(0)
提交回复
热议问题