问题
I have two SQL tables. One gets a reference value from another table which stores a list of Modules and their ID. But these descriptions are not unique. I am trying to remove the duplicates of Table A but I'm not sure how to update Table B to only reference the single values.
Example:
Table A: Table B:
-------------------------------- ------------------------------------
ID Description RefID ID Name
-------------------------------- ------------------------------------
1 Test 1 2 1 QuickReports
-------------------------------- ------------------------------------
2 Test 2 1 2 QuickReports
-------------------------------- ------------------------------------
I want the results to be the following:
Table A: Table B:
-------------------------------- ------------------------------------
ID Description RefID ID Name
-------------------------------- ------------------------------------
1 Test 1 1 1 QuickReports
-------------------------------- ------------------------------------
2 Test 2 1
--------------------------------
I managed to delete duplicates from table B using the below code but I haven't been able to update the records in Table A. Each table have over 500 records each.
WITH cte AS(
SELECT
Name,
ROW_NUMBER() OVER (
PARTITION BY
Name
ORDER BY
Name
)row_num
FROM ReportmodulesTest
)
DELETE FROM cte
WHERE row_num > 1;
回答1:
You would need to update table A first, before deleting from table B.
You tagged your question MySQL but that database would not support the delete
statement that you are showing. I suspect that you are running SQL Server, so here is how to do it in that database:
update a
set refid = b.minid
from tablea
inner join (select name, id, min(id) over(partition by name) minid from tableb) b
on b.id = a.id and b.minid <> a.id
In MySQL, you would phrase the same query as:
update tablea a
from tablea
inner join (select name, id, min(id) over(partition by name) minid from tableb) b on b.id = a.id
set a.refid = b.minid
where b.minid <> a.id
回答2:
You can update the first table using:
update a join
(select b.*,
min(id) over (partition by name) as min_id
from b
) b
on a.refid = b.id
set a.refid = b.min_id
where a.refid <> b.min_id;
Then you can delete rows in the second table with similar logic:
delete b
from b join
(select b.*,
min(id) over (partition by name) as min_id
from b
) bb
on bb.id = b.id
where bb.id <> b.min_id;
回答3:
I found a solution that has made this process easier. I first use Row_Number
to find duplicates in Table A and SELECT INTO
a temporary table.
SELECT
a.Id
, a.Name
, ROW_NUMBER() OVER(PARTITION BY Name ORDER BY Id DESC) RN
INTO
#TestTable
FROM
TableA a WITH(NOLOCK)
I then JOIN
Table A and Table B to see where the ID's match and identify which ID I need to keep and which ID's I need to delete:
SELECT
b.Id
, b.Name
, b.RefId
, ToKeep.Id KeepId
, ToDelete.Id DeleteId
FROM
#TestTable ToDelete
JOIN TableB b WITH(NOLOCK)
ON b.RefId = ToDelete.Id
JOIN #TestTable ToKeep
ON ToDelete.Name = ToKeep.Name
AND ToKeep.RN = 1
WHERE ToDelete.RN > 1
Then using a similar statement, I just update the records:
UPDATE b
SET
b.RefId = ToKeep.Id,
FROM #TestTable ToDelete
JOIN TableB b WITH(NOLOCK)
ON b.RefId = ToDelete.Id
JOIN #TestTable ToKeep
ON ToDelete.Name = ToKeep.Name
AND ToKeep.RN = 1
WHERE
ToDelete.RN > 1
Lastly, I can now delete the duplicate records:
DELETE a
FROM #TestTable b
INNER JOIN TableA a
ON b.Id = a.Id
WHERE
b.RN > 1
After this, you can use the same first SELECT
statement to ensure that all duplicates are deleted. Just remove the SELECT INTO
statement.
Thanks to an anonymous colleague of mine for this solution and hope this helps someone out there.
来源:https://stackoverflow.com/questions/65340539/how-do-i-update-a-table-that-references-duplicate-records