问题
How do I Delete duplicated rows in one Table and update References in another table to the remaining row? The duplication only occurs in the name. The Id Columns are Identity columns.
Example:
Assume we have two tables Doubles
and Data
.
Doubles table (
Id int,
Name varchar(50)
)
Data Table (
Id int,
DoublesId int
)
Now I Have Two entries in the Doubls table:
Id Name
1 Foo
2 Foo
And two entries in the Data Table:
ID DoublesId
1 1
2 2
At the end there should be only one entry in the Doubles Table:
Id Name
1 Foo
And two entries in the Data Table:
Id DoublesId
1 1
2 1
In the doubles Table there can be any number of duplicated rows per name (up to 30) and also regular 'single' rows.
回答1:
I've not run this, but hopefully it should be correct, and close enough to the final soln to get you there. Let me know any mistakes if you like and I'll update the answer.
--updates the data table to the min ids for each name
update Data
set id = final_id
from
Data
join
Doubles
on Doubles.id = Data.id
join
(
select
name
min(id) as final_id
from Doubles
group by name
) min_ids
on min_ids.name = Doubles.name
--deletes redundant ids from the Doubles table
delete
from Doubles
where id not in
(
select
min(id) as final_id
from Doubles
group by name
)
回答2:
Note: I have taken the liberty to rename your Id's to DoubleID and DataID respectively. I find that eassier to work with.
DECLARE @Doubles TABLE (DoubleID INT, Name VARCHAR(50))
DECLARE @Data TABLE (DataID INT, DoubleID INT)
INSERT INTO @Doubles VALUES (1, 'Foo')
INSERT INTO @Doubles VALUES (2, 'Foo')
INSERT INTO @Doubles VALUES (3, 'Bar')
INSERT INTO @Doubles VALUES (4, 'Bar')
INSERT INTO @Data VALUES (1, 1)
INSERT INTO @Data VALUES (1, 2)
INSERT INTO @Data VALUES (1, 3)
INSERT INTO @Data VALUES (1, 4)
SELECT * FROM @Doubles
SELECT * FROM @Data
UPDATE @Data
SET DoubleID = MinDoubleID
FROM @Data dt
INNER JOIN @Doubles db ON db.DoubleID = dt.DoubleID
INNER JOIN (
SELECT db.Name, MinDoubleID = MIN(db.DoubleID)
FROM @Doubles db
GROUP BY db.Name
) dbmin ON dbmin.Name = db.Name
/* Kudos to quassnoi */
;WITH q AS (
SELECT Name, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Name) AS rn
FROM @Doubles
)
DELETE
FROM q
WHERE rn > 1
SELECT * FROM @Doubles
SELECT * FROM @Data
回答3:
Take a look at this one, i have tried this, working fine
--create table Doubles ( Id int, Name varchar(50))
--create table Data( Id int, DoublesId int)
--select * from doubles
--select * from data
Declare @NonDuplicateID int
Declare @NonDuplicateName varchar(max)
DECLARE @sqlQuery nvarchar(max)
DECLARE DeleteDuplicate CURSOR FOR
SELECT Max(id),name AS SingleID FROM Doubles
GROUP BY [NAME]
OPEN DeleteDuplicate
FETCH NEXT FROM DeleteDuplicate INTO @NonDuplicateID, @NonDuplicateName
--Fetch next record
WHILE @@FETCH_STATUS = 0
BEGIN
--select b.ID , b.DoublesID, a.[name],a.id asdasd
--from doubles a inner join data b
--on
--a.ID=b.DoublesID
--where b.DoublesID<>@NonDuplicateID
--and a.[name]=@NonDuplicateName
print '---------------------------------------------';
select
@sqlQuery =
'update b
set b.DoublesID=' + cast(@NonDuplicateID as varchar(50)) + '
from
doubles a
inner join
data b
on
a.ID=b.DoublesID
where b.DoublesID<>' + cast(@NonDuplicateID as varchar(50)) +
' and a.[name]=''' + cast(@NonDuplicateName as varchar(max)) +'''';
print @sqlQuery
exec sp_executeSQL @sqlQuery
print '---------------------------------------------';
-- now move the cursor
FETCH NEXT FROM DeleteDuplicate INTO @NonDuplicateID ,@NonDuplicateName
END
CLOSE DeleteDuplicate --Close cursor
DEALLOCATE DeleteDuplicate --Deallocate cursor
---- Delete duplicate rows from original table
DELETE
FROM doubles
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM doubles
GROUP BY [NAME]
)
Please try and let me know if this helped you
Thanks
~ Aamod
回答4:
If you are using MYSQL following worked for me. I did it for 2 steps
- Step 1 -> Update all Data rows to one Double table reference (with lowest id)
- Step 2 -> Delete all duplicates with keeping lowest id
Step 1 ->
update Data
join
Doubles
on Data.DoublesId = Doubles.id
join
(
select name, min(id) as final_id
from Doubles
group by name
) min_ids
on min_ids.name = Doubles.name
set DoublesId = min_ids.final_id;
Step 2 ->
DELETE c1 FROM Doubles c1
INNER JOIN Doubles c2
WHERE
c1.id > c2.id AND
c1.name = c2.name;
来源:https://stackoverflow.com/questions/1497183/delete-duplicated-rows-and-update-references