Remove duplicate rows in MySQL

前端 未结 25 3587
囚心锁ツ
囚心锁ツ 2020-11-21 04:33

I have a table with the following fields:

id (Unique)
url (Unique)
title
company
site_id

Now, I need to remove rows having same titl

相关标签:
25条回答
  • 2020-11-21 05:25

    To Delete the duplicate record in a table.

    delete from job s 
    where rowid < any 
    (select rowid from job k 
    where s.site_id = k.site_id and 
    s.title = k.title and 
    s.company = k.company);
    

    or

    delete from job s 
    where rowid not in 
    (select max(rowid) from job k 
    where s.site_id = k.site_id and
    s.title = k.title and 
    s.company = k.company);
    
    0 讨论(0)
  • 2020-11-21 05:25
    -- Here is what I used, and it works:
    create table temp_table like my_table;
    -- t_id is my unique column
    insert into temp_table (id) select id from my_table GROUP by t_id;
    delete from my_table where id not in (select id from temp_table);
    drop table temp_table;
    
    0 讨论(0)
  • 2020-11-21 05:26

    A solution that is simple to understand and works with no primary key:

    1) add a new boolean column

    alter table mytable add tokeep boolean;
    

    2) add a constraint on the duplicated columns AND the new column

    alter table mytable add constraint preventdupe unique (mycol1, mycol2, tokeep);
    

    3) set the boolean column to true. This will succeed only on one of the duplicated rows because of the new constraint

    update ignore mytable set tokeep = true;
    

    4) delete rows that have not been marked as tokeep

    delete from mytable where tokeep is null;
    

    5) drop the added column

    alter table mytable drop tokeep;
    

    I suggest that you keep the constraint you added, so that new duplicates are prevented in the future.

    0 讨论(0)
  • 2020-11-21 05:27

    I found a simple way. (keep latest)

    DELETE t1 FROM tablename t1 INNER JOIN tablename t2 
    WHERE t1.id < t2.id AND t1.column1 = t2.column1 AND t1.column2 = t2.column2;
    
    0 讨论(0)
  • 2020-11-21 05:28

    The faster way is to insert distinct rows into a temporary table. Using delete, it took me a few hours to remove duplicates from a table of 8 million rows. Using insert and distinct, it took just 13 minutes.

    CREATE TABLE tempTableName LIKE tableName;  
    CREATE INDEX ix_all_id ON tableName(cellId,attributeId,entityRowId,value);  
    INSERT INTO tempTableName(cellId,attributeId,entityRowId,value) SELECT DISTINCT cellId,attributeId,entityRowId,value FROM tableName;  
    TRUNCATE TABLE tableName;
    INSERT INTO tableName SELECT * FROM tempTableName; 
    DROP TABLE tempTableName;  
    
    0 讨论(0)
  • 2020-11-21 05:28

    I had to do this with text fields and came across the limit of 100 bytes on the index.

    I solved this by adding a column, doing a md5 hash of the fields, and the doing the alter.

    ALTER TABLE table ADD `merged` VARCHAR( 40 ) NOT NULL ;
    UPDATE TABLE SET merged` = MD5(CONCAT(`col1`, `col2`, `col3`))
    ALTER IGNORE TABLE table ADD UNIQUE INDEX idx_name (`merged`);
    
    0 讨论(0)
提交回复
热议问题