Delete duplicated rows and Update references

与世无争的帅哥 提交于 2020-05-25 17:07:18

问题


How do I Delete duplicated rows in one Table and update References in another table to the remaining row? The duplication only occurs in the name. The Id Columns are Identity columns.

Example:

Assume we have two tables Doubles and Data.

Doubles table (
   Id int,
   Name varchar(50)
)

Data Table (
    Id int,
    DoublesId int
)

Now I Have Two entries in the Doubls table:

Id Name
1  Foo
2  Foo

And two entries in the Data Table:

ID DoublesId
1  1
2  2

At the end there should be only one entry in the Doubles Table:

Id Name
1  Foo

And two entries in the Data Table:

Id DoublesId
1  1
2  1 

In the doubles Table there can be any number of duplicated rows per name (up to 30) and also regular 'single' rows.


回答1:


I've not run this, but hopefully it should be correct, and close enough to the final soln to get you there. Let me know any mistakes if you like and I'll update the answer.

--updates the data table to the min ids for each name
update Data
set id = final_id
from
  Data
join
  Doubles 
on Doubles.id = Data.id
join
(
  select 
    name
    min(id) as final_id
  from Doubles
  group by name
) min_ids
on min_ids.name = Doubles.name

--deletes redundant ids from the Doubles table
delete 
from Doubles
where id not in
(
  select 
    min(id) as final_id
  from Doubles
  group by name
)



回答2:


Note: I have taken the liberty to rename your Id's to DoubleID and DataID respectively. I find that eassier to work with.

DECLARE @Doubles TABLE (DoubleID INT, Name VARCHAR(50))
DECLARE @Data TABLE (DataID INT, DoubleID INT)

INSERT INTO @Doubles VALUES (1, 'Foo')
INSERT INTO @Doubles VALUES (2, 'Foo')
INSERT INTO @Doubles VALUES (3, 'Bar')
INSERT INTO @Doubles VALUES (4, 'Bar')

INSERT INTO @Data VALUES (1, 1)
INSERT INTO @Data VALUES (1, 2)
INSERT INTO @Data VALUES (1, 3)
INSERT INTO @Data VALUES (1, 4)

SELECT * FROM @Doubles
SELECT * FROM @Data

UPDATE @Data
SET DoubleID = MinDoubleID
FROM    @Data dt
        INNER JOIN @Doubles db ON db.DoubleID = dt.DoubleID
        INNER JOIN (
            SELECT    db.Name, MinDoubleID = MIN(db.DoubleID)
            FROM      @Doubles db
            GROUP BY  db.Name
        ) dbmin ON dbmin.Name = db.Name

/* Kudos to quassnoi */
;WITH q AS (
    SELECT Name, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Name) AS rn
    FROM @Doubles
    )
DELETE  
FROM    q
WHERE   rn > 1

SELECT * FROM @Doubles
SELECT * FROM @Data



回答3:


Take a look at this one, i have tried this, working fine

--create table Doubles  (   Id int,   Name varchar(50))

--create table Data(    Id int,    DoublesId int)

--select * from doubles
--select * from data

Declare @NonDuplicateID int
Declare @NonDuplicateName varchar(max)
DECLARE @sqlQuery nvarchar(max)
DECLARE DeleteDuplicate CURSOR FOR

SELECT Max(id),name AS SingleID FROM Doubles
GROUP BY [NAME]

OPEN DeleteDuplicate 
FETCH NEXT FROM DeleteDuplicate INTO @NonDuplicateID, @NonDuplicateName
--Fetch next record
WHILE @@FETCH_STATUS = 0
BEGIN

--select b.ID , b.DoublesID, a.[name],a.id asdasd
--from doubles a    inner join  data b
--on
--a.ID=b.DoublesID
--where b.DoublesID<>@NonDuplicateID
--and a.[name]=@NonDuplicateName

print '---------------------------------------------';

select 
@sqlQuery = 
      'update b
        set b.DoublesID=' + cast(@NonDuplicateID as varchar(50)) + '
        from 
        doubles a
            inner join
        data b
            on
        a.ID=b.DoublesID
        where b.DoublesID<>' +  cast(@NonDuplicateID as varchar(50)) +
        ' and a.[name]=''' +  cast(@NonDuplicateName as varchar(max)) +'''';

print @sqlQuery
exec sp_executeSQL @sqlQuery
print '---------------------------------------------';

-- now move the cursor
FETCH NEXT FROM DeleteDuplicate INTO @NonDuplicateID ,@NonDuplicateName
END

CLOSE DeleteDuplicate --Close cursor
DEALLOCATE DeleteDuplicate --Deallocate cursor

---- Delete duplicate rows from original table
DELETE
FROM doubles
WHERE ID NOT IN
(
    SELECT MAX(ID)
    FROM doubles
    GROUP BY [NAME]
)

Please try and let me know if this helped you

Thanks

~ Aamod




回答4:


If you are using MYSQL following worked for me. I did it for 2 steps

  • Step 1 -> Update all Data rows to one Double table reference (with lowest id)
  • Step 2 -> Delete all duplicates with keeping lowest id

Step 1 ->

update Data 
join
  Doubles 
on Data.DoublesId = Doubles.id
join
(
  select name, min(id) as final_id
  from Doubles
  group by name
) min_ids
on min_ids.name = Doubles.name
set DoublesId = min_ids.final_id;

Step 2 ->

DELETE c1 FROM Doubles c1
INNER JOIN Doubles c2 
WHERE
    c1.id > c2.id AND 
    c1.name = c2.name;


来源:https://stackoverflow.com/questions/1497183/delete-duplicated-rows-and-update-references

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!