Delete duplicate rows from table with no unique key

后端 未结 5 443
误落风尘
误落风尘 2021-01-13 09:46

How do I delete duplicates rows in Postgres 9 table, the rows are completely duplicates on every field AND there is no individual field that could be used as a unique key so

5条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-01-13 10:22

    As you have no primary key, there is no easy way to distinguish one duplicated line from any other one. That's one of the reasons why it is highly recommended that any table have a primary key (*).

    So you are left with only 2 solutions :

    • use a temporary table as suggested by Rahul (IMHO the simpler and cleaner way) (**)
    • use procedural SQL and a cursor either from a procedural language such as Python or [put here your prefered language] or with PL/pgSQL. Something like (beware untested) :

      CREATE OR REPLACE FUNCTION deduplicate() RETURNS integer AS $$
      DECLARE
       curs CURSOR FOR SELECT * FROM releases_labels ORDER BY label, release_id, catno;
       r releases_labels%ROWTYPE;
       old releases_labels%ROWTYPE;
       n integer;
      BEGIN
       n := 0;
       old := NULL;
       FOR rec IN curs LOOP
        r := rec;
        IF r = old THEN
         DELETE FROM releases_labels WHERE CURRENT OF curs;
         n := n + 1;
        END IF;
        old := rec;
       END LOOP;
       RETURN n;
      END;
      $$ LANGUAGE plpgsql;
      
      SELECT deduplicate();
      

      should delete duplicate lines and return the number of lines actually deleted. It is not necessarily the most efficient way, but you only touch rows that need to be deleted so you will not have to lock whole table.

    (*) hopefully PostgreSQL offers the ctid pseudo column that you can use as a key. If you table contains an oid column, you can also use it as it will never change.

    (**) PostgreSQL WITH allows you to do that in in single SQL statement

    This two points from answer from Nick Barnes

提交回复
热议问题