Deleting Duplicates in MySQL

后端 未结 4 1581
清歌不尽
清歌不尽 2020-12-04 03:17

I have a table like this

userid  visitorid   time
1       10          2009-12-23
1       18          2009-12-06
1       18          2009-12-14
1       18             


        
相关标签:
4条回答
  • 2020-12-04 03:37
    Delete from YourTable VersionA
      where VersionA.Time NOT IN
        ( select MAX( VersionB.Time ) Time
             from YourTable VersionB
             where VersionA.UserID = VersionB.UserID
               and VersionA.VisitorID = VersionB.VisitorID )
    

    Syntax might need to be adjusted, but SHOULD do the trick. Additionally, you may want to pre-query the Subselect into its own table FIRST, then run the DELETE FROM against that result set.

    0 讨论(0)
  • 2020-12-04 03:41
    DELETE  mo.*
    FROM    (
            SELECT  userid, visitorid, MAX(time) AS mtime
            FROM    mytable
            GROUP BY
                    userid, visitorid
            ) mi
    JOIN    mytable mo
    ON      mo.userid = mi.userid
            AND mo.visitorid = mo.visitorid
            AND mo.time < mi.mtime
    
    0 讨论(0)
  • 2020-12-04 03:48

    Assuming your table is called Visitors:

    DELETE v1.* FROM Visitors v1
    LEFT JOIN (
        SELECT userid, visitorid, MAX(time) AS time
        FROM Visitors v2
        GROUP BY userid, visitorid
    ) v3 ON v1.userid=v3.userid AND v1.visitorid=v3.visitorid AND v1.time = v3.time
    WHERE v3.userid IS NULL;
    
    0 讨论(0)
  • 2020-12-04 03:59

    You need to work around MySQL bug#6980, with a doubly nested subquery:

    DELETE FROM foo_table
    WHERE foo_table.time IN (
        SELECT time FROM (
            SELECT time FROM
                foo_table
                LEFT OUTER JOIN (
                    SELECT MAX(time) AS time
                    FROM foo_table
                    GROUP BY userid, visitorid
                    ) AS foo_table_keep
                        USING (time)
            WHERE
                foo_table_keep.time IS NULL
            ) AS foo_table_delete
        );
    

    Using GROUP BY collapses duplicates down to a single row, and MAX(time) chooses which value you want. Use another aggregate function than MAX if you want.

    Wrapping the subquery twice, providing aliases for each, avoids the error:

    ERROR 1093 (HY000): You can't specify target table 'foo_table' for update in FROM clause
    

    and has the extra advantage that it's clearer how the statement is choosing what to keep.

    0 讨论(0)
提交回复
热议问题