I have a table that has a lot of duplicated rows and no primary key.
I want to remove just the duplicated records, but when I try to do this it would remove all peers.
You should consider using row_number()
if want to delete based on a unique id column(or a timestamp), since ctid
alone is not always reliable when you want to only keep recent records etc.
WITH d
AS (SELECT ctid c,
row_number()
OVER (
partition BY s
ORDER BY id) rn
FROM t)
DELETE FROM t
WHERE ctid IN (SELECT c
FROM d
WHERE rn > 1) ;
Demo
On PostgreSQL the physical location of the row is called CTID.
So if you want to view it use a QUERY like this:
SELECT CTID FROM table_name
To use it on a DELETE statement to remove the duplicated records use it like this:
DELETE FROM table_name WHERE CTID NOT IN (
SELECT RECID FROM
(SELECT MIN(CTID) AS RECID, other_columns
FROM table_name GROUP BY other_columns)
a);
Remember that table_name is the desired table and other_columns are the columns that you want to use to filter that.
Ie:
DELETE FROM user_department WHERE CTID NOT IN (
SELECT RECID FROM
(SELECT MIN(CTID) AS RECID, ud.user_id, ud.department_id
FROM user_department ud GROUP BY ud.user_id, ud.department_id)
a);
Simplify this by one query level:
DELETE FROM table_name
WHERE ctid NOT IN (
SELECT min(ctid)
FROM table_name
GROUP BY $other_columns);
.. where duplicates are defined by equality in $other_columns
.
There is no need to include columns from the GROUP BY
clause in the SELECT
list, so you don't need another subquery.
ctid in the current manual.