Delete duplicate rows from table with no unique key

后端未结

关注

 5  443

误落风尘 2021-01-13 09:46

How do I delete duplicates rows in Postgres 9 table, the rows are completely duplicates on every field AND there is no individual field that could be used as a unique key so

5条回答

小蘑菇 (楼主)

2021-01-13 10:22
As you have no primary key, there is no easy way to distinguish one duplicated line from any other one. That's one of the reasons why it is highly recommended that any table have a primary key (*).

So you are left with only 2 solutions :
- use a temporary table as suggested by Rahul (IMHO the simpler and cleaner way) (**)
- use procedural SQL and a cursor either from a procedural language such as Python or [put here your prefered language] or with PL/pgSQL. Something like (beware untested) :
```
CREATE OR REPLACE FUNCTION deduplicate() RETURNS integer AS $$
DECLARE
 curs CURSOR FOR SELECT * FROM releases_labels ORDER BY label, release_id, catno;
 r releases_labels%ROWTYPE;
 old releases_labels%ROWTYPE;
 n integer;
BEGIN
 n := 0;
 old := NULL;
 FOR rec IN curs LOOP
  r := rec;
  IF r = old THEN
   DELETE FROM releases_labels WHERE CURRENT OF curs;
   n := n + 1;
  END IF;
  old := rec;
 END LOOP;
 RETURN n;
END;
$$ LANGUAGE plpgsql;

SELECT deduplicate();
```
  should delete duplicate lines and return the number of lines actually deleted. It is not necessarily the most efficient way, but you only touch rows that need to be deleted so you will not have to lock whole table.
(*) hopefully PostgreSQL offers the ctid pseudo column that you can use as a key. If you table contains an oid column, you can also use it as it will never change.

(**) PostgreSQL WITH allows you to do that in in single SQL statement

This two points from answer from Nick Barnes
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...