Remove duplicates in large MySql table

前端 未结 6 2025
自闭症患者
自闭症患者 2021-01-06 14:44

I have a question about MySql. I have a table with 7.479.194 records. Some records are duplicated. I would like to do this:

insert into new_table 
  select *         


        
相关标签:
6条回答
  • 2021-01-06 15:18

    A bit dirty maybe, but it has done the trick for me the few times that I've needed it: Remove duplicate entries in MySQL.

    Basically, you simply create a unique index consisting of all the columns that you wan't to be unique in the table.

    As always before this kind of procedures, a backup before proceeding is recommended.

    0 讨论(0)
  • 2021-01-06 15:24

    From my experience when your table grows to number of millions records and more the most effective way to handle duplicates will: 1) export data to text files 2) sort in file 3) remove duplicates in file 4) load back to database

    With increasing size of the data this approach works eventually faster than any SQL query you may invent

    0 讨论(0)
  • 2021-01-06 15:28

    To avoid the memory issue, avoid the big select by having a small external program, using the logic as below. First, backup your database. Then:

    do {
    # find a record
    x=sql: select * from table1 limit 1;
    if (null x)
    then
     exit # no more data in table1
    fi
    insert x into table2
    
    # find the value of the field that should NOT be duplicated
    a=parse(x for table1.a)
    # delete all such entries from table1
    sql: delete * from table1 where a='$a';
    
    }
    
    0 讨论(0)
  • 2021-01-06 15:28

    You don't need to group data. Try this:

     delete from old_table
        USING old_table, old_table as vtable  
        WHERE (old_table.id > vtable.id)  
        AND (old_table.city=vtable.city AND 
    old_table.post_code=vtable.post_code 
    AND old_table.short_code=vtable.short_code) 
    

    I can't comment posts becouse of my points ... repair table old_table; next: show:

    EXPLAIN SELECT old_table.id FROM   old_table, old_table as vtable  
            WHERE (old_table.id > vtable.id)  
            AND (old_table.city=vtable.city AND 
        old_table.post_code=vtable.post_code 
        AND old_table.short_code=vtable.short_code
    

    Show: os~> ulimit -a; mysql>SHOW VARIABLES LIKE 'open_files_limit';

    next: Remove all os restrictions form the mysql process.

    ulimit -n 1024 etc.

    0 讨论(0)
  • 2021-01-06 15:29

    MySQL has a INSERT IGNORE. From the docs:

    [...] however, when INSERT IGNORE is used, the insert operation fails silently for the row containing the unmatched value, but any rows that are matched are inserted.

    So you could use your query from above b just adding a IGNORE

    0 讨论(0)
  • 2021-01-06 15:37

    This will populate NEW_TABLE with unique values, and the id value is the first id of the bunch:

    INSERT INTO NEW_TABLE
      SELECT MIN(ot.id),
             ot.city,
             ot.post_code,
             ot.short_ccode
        FROM OLD_TABLE ot
    GROUP BY ot.city, ot.post_code, ot.short_ccode
    

    If you want the highest id value per bunch:

    INSERT INTO NEW_TABLE
      SELECT MAX(ot.id),
             ot.city,
             ot.post_code,
             ot.short_ccode
        FROM OLD_TABLE ot
    GROUP BY ot.city, ot.post_code, ot.short_ccode
    
    0 讨论(0)
提交回复
热议问题