How to use the physical location of rows (ROWID) in a DELETE statement

前端 未结 3 1808
滥情空心
滥情空心 2021-01-06 06:25

I have a table that has a lot of duplicated rows and no primary key.
I want to remove just the duplicated records, but when I try to do this it would remove all peers.

相关标签:
3条回答
  • 2021-01-06 06:32

    You should consider using row_number() if want to delete based on a unique id column(or a timestamp), since ctid alone is not always reliable when you want to only keep recent records etc.

    WITH d 
         AS (SELECT ctid c, 
                    row_number() 
                      OVER ( 
                        partition BY s 
                        ORDER BY id) rn 
             FROM   t) 
    DELETE FROM t 
    WHERE  ctid IN (SELECT c 
                   FROM   d 
                   WHERE  rn > 1)  ; 
    

    Demo

    0 讨论(0)
  • 2021-01-06 06:42

    On PostgreSQL the physical location of the row is called CTID.

    So if you want to view it use a QUERY like this:

    SELECT CTID FROM table_name
    

    To use it on a DELETE statement to remove the duplicated records use it like this:

    DELETE FROM table_name WHERE CTID NOT IN (
      SELECT RECID FROM 
        (SELECT MIN(CTID) AS RECID, other_columns 
          FROM table_name GROUP BY other_columns) 
      a);
    

    Remember that table_name is the desired table and other_columns are the columns that you want to use to filter that.

    Ie:

    DELETE FROM user_department WHERE CTID NOT IN (
      SELECT RECID FROM 
        (SELECT MIN(CTID) AS RECID, ud.user_id, ud.department_id
          FROM user_department ud GROUP BY ud.user_id, ud.department_id) 
      a);
    
    0 讨论(0)
  • 2021-01-06 06:54

    Simplify this by one query level:

    DELETE FROM table_name
    WHERE  ctid NOT IN (
       SELECT min(ctid)
       FROM   table_name
       GROUP  BY $other_columns);
    

    .. where duplicates are defined by equality in $other_columns.
    There is no need to include columns from the GROUP BY clause in the SELECT list, so you don't need another subquery.

    ctid in the current manual.

    0 讨论(0)
提交回复
热议问题