I have a table that has a lot of duplicated rows and no primary key.
I want to remove just the duplicated records, but when I try to do this it would remove all peers.
How can I find the ROWID
from a table in Postgres?
Simplify this by one query level:
DELETE FROM table_name
WHERE ctid NOT IN (
SELECT min(ctid)
FROM table_name
GROUP BY $other_columns);
.. where duplicates are defined by equality in $other_columns
.
There is no need to include columns from the GROUP BY
clause in the SELECT
list, so you don't need another subquery.
On PostgreSQL the physical location of the row is called CTID.
So if you want to view it use a QUERY like this:
SELECT CTID FROM table_name
To use it on a DELETE statement to remove the duplicated records use it like this:
DELETE FROM table_name WHERE CTID NOT IN (
SELECT RECID FROM
(SELECT MIN(CTID) AS RECID, other_columns
FROM table_name GROUP BY other_columns)
a);
Remember that table_name is the desired table and other_columns are the columns that you want to use to filter that.
Ie:
DELETE FROM user_department WHERE CTID NOT IN (
SELECT RECID FROM
(SELECT MIN(CTID) AS RECID, ud.user_id, ud.department_id
FROM user_department ud GROUP BY ud.user_id, ud.department_id)
a);
You should consider using row_number()
if want to delete based on a unique id column(or a timestamp), since ctid
alone is not always reliable when you want to only keep recent records etc.
WITH d
AS (SELECT ctid c,
row_number()
OVER (
partition BY s
ORDER BY id) rn
FROM t)
DELETE FROM t
WHERE ctid IN (SELECT c
FROM d
WHERE rn > 1) ;
来源:https://stackoverflow.com/questions/15026420/how-to-use-the-physical-location-of-rows-rowid-in-a-delete-statement