Remove duplicates using only a MySQL query?

后端 未结 7 1143
死守一世寂寞
死守一世寂寞 2020-11-27 07:55

I have a table with the following columns:

URL_ID    
URL_ADDR    
URL_Time

I want to remove duplicates on the URL_ADDR column

相关标签:
7条回答
  • 2020-11-27 07:57

    You may want to try the method mentioned at http://labs.creativecommons.org/2010/01/12/removing-duplicate-rows-in-mysql/.

    ALTER IGNORE TABLE your_table ADD UNIQUE INDEX `tmp_index` (URL_ADDR);
    
    0 讨论(0)
  • 2020-11-27 07:57

    This will work provided that your URL_ID column is unique.

    DELETE FROM url WHERE URL_ID IN (
    SELECT URL_ID
    FROM url a INNER JOIN (
        SELECT URL_ADDR, MAX(URL_ID) MaxURLId 
        FROM url
        GROUP BY URL_ADDR
        HAVING COUNT(*) > 1) b ON a.URL_ID <> b.MaxURLId AND a.URL_ADDR = b.URL_ADDR
    )
    
    0 讨论(0)
  • 2020-11-27 07:59

    Daniel Vassallo How to for multiple column?

    DELETE t2 FROM directory1 t1 JOIN directory1 t2 ON (t2.page = t1.page, t2.parentTopic = t1.parentTopic, t2.title = t1.title, t2.description = t1.description, t2.linktype = t1.linktype, t2.priority = t1.priority AND t2.linkID > t1.linkID);

    maybe like this?

    0 讨论(0)
  • 2020-11-27 08:07

    You can group by on the URL_ADDR which will effectively give you only distinct values in the URL_ADDR field.

    select 
     URL_ID
     URL_ADDR
     URL_Time
    from
     some_table
    group by
     URL_ADDR
    

    Enjoy!

    0 讨论(0)
  • 2020-11-27 08:12

    Consider the following test case:

    CREATE TABLE mytb (url_id int, url_addr varchar(100));
    
    INSERT INTO mytb VALUES (1, 'www.google.com');
    INSERT INTO mytb VALUES (2, 'www.microsoft.com');
    INSERT INTO mytb VALUES (3, 'www.apple.com');
    INSERT INTO mytb VALUES (4, 'www.google.com');
    INSERT INTO mytb VALUES (5, 'www.cnn.com');
    INSERT INTO mytb VALUES (6, 'www.apple.com');
    

    Where our test table now contains:

    SELECT * FROM mytb;
    +--------+-------------------+
    | url_id | url_addr          |
    +--------+-------------------+
    |      1 | www.google.com    |
    |      2 | www.microsoft.com |
    |      3 | www.apple.com     |
    |      4 | www.google.com    |
    |      5 | www.cnn.com       |
    |      6 | www.apple.com     |
    +--------+-------------------+
    5 rows in set (0.00 sec)
    

    Then we can use the multiple-table DELETE syntax as follows:

    DELETE t2
    FROM   mytb t1
    JOIN   mytb t2 ON (t2.url_addr = t1.url_addr AND t2.url_id > t1.url_id);
    

    ... which will delete duplicate entries, leaving only the first url based on url_id:

    SELECT * FROM mytb;
    +--------+-------------------+
    | url_id | url_addr          |
    +--------+-------------------+
    |      1 | www.google.com    |
    |      2 | www.microsoft.com |
    |      3 | www.apple.com     |
    |      5 | www.cnn.com       |
    +--------+-------------------+
    3 rows in set (0.00 sec)
    

    UPDATE - Further to new comments above:

    If the duplicate URLs will not have the same format, you may want to apply the REPLACE() function to remove www. or http:// parts. For example:

    DELETE t2
    FROM   mytb t1
    JOIN   mytb t2 ON (REPLACE(t2.url_addr, 'www.', '') = 
                       REPLACE(t1.url_addr, 'www.', '') AND 
                       t2.url_id > t1.url_id);
    
    0 讨论(0)
  • 2020-11-27 08:16

    Well, you could always:

    1. create a temporary table;
    2. INSERT INTO ... SELECT DISTINCT into the temp table from original table;
    3. clear original table
    4. INSERT INTO ... SELECT into the original table from the temp table
    5. drop temp table.

    It's clumsy and awkward, and requires several queries (not to mention privileges), but it will do the trick if you don't find another solution.

    0 讨论(0)
提交回复
热议问题