How to use the physical location of rows (ROWID) in a DELETE statement

前端未结

关注

 3  1808

I have a table that has a lot of duplicated rows and no primary key.
I want to remove just the duplicated records, but when I try to do this it would remove all peers.

相关标签:

3条回答

孤独总比滥情好

2021-01-06 06:32

You should consider using row_number() if want to delete based on a unique id column(or a timestamp), since ctid alone is not always reliable when you want to only keep recent records etc.

WITH d 
     AS (SELECT ctid c, 
                row_number() 
                  OVER ( 
                    partition BY s 
                    ORDER BY id) rn 
         FROM   t) 
DELETE FROM t 
WHERE  ctid IN (SELECT c 
               FROM   d 
               WHERE  rn > 1)  ;

Demo

0 讨论(0)

心在旅途

2021-01-06 06:42

On PostgreSQL the physical location of the row is called CTID.

So if you want to view it use a QUERY like this:

SELECT CTID FROM table_name

To use it on a DELETE statement to remove the duplicated records use it like this:

DELETE FROM table_name WHERE CTID NOT IN (
  SELECT RECID FROM 
    (SELECT MIN(CTID) AS RECID, other_columns 
      FROM table_name GROUP BY other_columns) 
  a);

Remember that table_name is the desired table and other_columns are the columns that you want to use to filter that.

Ie:

DELETE FROM user_department WHERE CTID NOT IN (
  SELECT RECID FROM 
    (SELECT MIN(CTID) AS RECID, ud.user_id, ud.department_id
      FROM user_department ud GROUP BY ud.user_id, ud.department_id) 
  a);

0 讨论(0)

走了就别回头了

2021-01-06 06:54
Simplify this by one query level:
```
DELETE FROM table_name
WHERE  ctid NOT IN (
   SELECT min(ctid)
   FROM   table_name
   GROUP  BY $other_columns);
```
.. where duplicates are defined by equality in $other_columns.
There is no need to include columns from the GROUP BY clause in the SELECT list, so you don't need another subquery.

ctid in the current manual.
0 讨论(0)
发布评论:

提交评论
- 加载中...