问题
Yes, you can find similar questions numerous times, but: the most elegant solutions posted here, work for SQL Server, but not for Sybase (in my case Sybase Anywhere 11). I have even found some Sybase-related questions marked as duplicates for SQL Server questions, which doesn't help.
One example for solutions I liked, but didn't work, is the WITH ... DELETE ...
construct.
I have found working solutions using cursors or while-loops, but I hope it is possible without loops.
I hope for a nice, simple and fast query, just deleting all but one exact duplicate.
Here a little framework for testing:
IF OBJECT_ID( 'tempdb..#TestTable' ) IS NOT NULL
DROP TABLE #TestTable;
CREATE TABLE #TestTable (Column1 varchar(1), Column2 int);
INSERT INTO #TestTable VALUES ('A', 1);
INSERT INTO #TestTable VALUES ('A', 1); -- duplicate
INSERT INTO #TestTable VALUES ('A', 1); -- duplicate
INSERT INTO #TestTable VALUES ('A', 2);
INSERT INTO #TestTable VALUES ('B', 1);
INSERT INTO #TestTable VALUES ('B', 2);
INSERT INTO #TestTable VALUES ('B', 2); -- duplicate
INSERT INTO #TestTable VALUES ('C', 1);
INSERT INTO #TestTable VALUES ('C', 2);
SELECT * FROM #TestTable ORDER BY Column1,Column2;
DELETE <your solution here>
SELECT * FROM #TestTable ORDER BY Column1,Column2;
回答1:
If all fields are identical, you can just do this:
select distinct *
into #temp_table
from table_with_duplicates
delete table_with_duplicates
insert into table_with_duplicates select * from #temp_table
If all fields are not identical, for example, if you have an id that is different, then you'll need to list all the fields in the select statement, and hard code a value in the id to make it identical, if that is a field you don't care about. For example:
insert #temp_table field1, field2, id select (field1, field2, 999)
from table_with_duplicates
回答2:
This works well and fast:
DELETE FROM #TestTable
WHERE ROWID(#TestTable) IN (
SELECT rowid FROM (
SELECT ROWID(#TestTable) rowid,
ROW_NUMBER() OVER(PARTITION BY Column1,Column2 ORDER BY Column1,Column2) rownum
FROM #TestTable
) sub
WHERE rownum > 1
);
If you don't know OVER(PARTITION BY ...)
, just execute the inner SELECT
statement to see what it does.
回答3:
Here is another interesting one I found and adopted:
DELETE FROM #TestTable dupes
FROM #TestTable dupes, #TestTable fullTable
WHERE dupes.Column1 = fullTable.Column1
AND dupes.Column2 = fullTable.Column2
AND ROWID(dupes) > ROWID(fullTable);
or, if you like explicit joins more (I do):
DELETE FROM #TestTable dupes
FROM #TestTable dupes
INNER JOIN #TestTable fullTable
ON dupes.Column1 = fullTable.Column1
AND dupes.Column2 = fullTable.Column2
AND ROWID(dupes) > ROWID(fullTable);
or the short form (a "natural" join incorporates identical column names automatically):
DELETE FROM #TestTable dupes
FROM #TestTable dupes
NATURAL JOIN #TestTable fullTable
ON ROWID(dupes) > ROWID(fullTable);
...if someone finds a solution not requiring ROWID()
, I would be interested to see them.
回答4:
Please try this:
create clustered index i1 on table table_name(column_name) with ignore_dup_row
create table #test(id int,name char(9))
insert into #test values(1,"A")
insert into #test values(1,"A")
create clustered index i1 on #test(id) with ignore_dup_row
select * from #test
回答5:
Ok, now that I know the ROWID()
function, solutions for tables with primary key (PK) can be easily adopted. This one first selects all rows to keep and then deletes the remaining ones:
DELETE FROM #TestTable
FROM #TestTable
LEFT OUTER JOIN (
SELECT MIN(ROWID(#TestTable)) rowid
FROM #TestTable
GROUP BY Column1, Column2
) AS KeepRows ON ROWID(#TestTable) = KeepRows.rowid
WHERE KeepRows.rowid IS NULL;
...or how about this shorter variant? I like!
DELETE FROM #TestTable
WHERE ROWID(#TestTable) NOT IN (
SELECT MIN(ROWID(#TestTable))
FROM #TestTable
GROUP BY Column1, Column2
);
In this post, which inspired me most, is a comment that NOT IN
might be slower. But that's for SQL server, and sometimes elegance is more important :) - I also think it all depends on good indexes.
Anyway, usually it is bad design, to have tables without a PK. You should at least add an "autoinc" ID, and if you do, you can use that ID instead of the ROWID()
function, which is a non-standard extension by Sybase (some others have it, too).
来源:https://stackoverflow.com/questions/19544489/how-to-delete-duplicate-rows-in-sybase-when-you-have-no-unique-key