问题
I am using postgres. I want to delete Duplicate rows. The condition is that , 1 copy from the set of duplicate rows would not be deleted.
i.e : if there are 5 duplicate records then 4 of them will be deleted.
回答1:
Try the steps described in this article: Removing duplicates from a PostgreSQL database.
It describes a situation when you have to deal with huge amount of data which isn't possible to group by
.
A simple solution would be this:
DELETE FROM foo
WHERE id NOT IN (SELECT min(id) --or max(id)
FROM foo
GROUP BY hash)
Where hash
is something that gets duplicated.
回答2:
delete from table
where not id in
(select max(id) from table group by [duplicate row])
This is random (max Value) choice which row you need to keep. If you have aggre whit this please provide more details
回答3:
The fastest is is join to the same table. http://www.postgresql.org/docs/8.1/interactive/sql-delete.html
CREATE TABLE test(id INT,id2 INT);
CREATE TABLE
mapy=# INSERT INTO test VALUES(1,2);
INSERT 0 1
mapy=# INSERT INTO test VALUES(1,3);
INSERT 0 1
mapy=# INSERT INTO test VALUES(1,4);
INSERT 0 1
DELETE FROM test t1 USING test t2 WHERE t1.id=t2.id AND t1.id2<t2.id2;
DELETE 2
mapy=# SELECT * FROM test;
id | id2
----+-----
1 | 4
(1 row)
回答4:
delete from table t1
where rowid > (SELECT min(rowid) FROM table t2 group by
t2.id,t2.name );
来源:https://stackoverflow.com/questions/24669366/find-duplicate-rows-and-keep-the-latest-one-delete-the-rest