How do I update a table that references duplicate records?

问题

I have two SQL tables. One gets a reference value from another table which stores a list of Modules and their ID. But these descriptions are not unique. I am trying to remove the duplicates of Table A but I'm not sure how to update Table B to only reference the single values.

Example:

Table A:                                      Table B:

--------------------------------            ------------------------------------
ID      Description      RefID               ID            Name       
--------------------------------            ------------------------------------
1       Test 1           2                   1            QuickReports
--------------------------------            ------------------------------------
2       Test 2           1                   2            QuickReports
--------------------------------            ------------------------------------

I want the results to be the following:

Table A:                                      Table B:

--------------------------------            ------------------------------------
ID      Description      RefID               ID            Name       
--------------------------------            ------------------------------------
1       Test 1           1                   1            QuickReports
--------------------------------            ------------------------------------
2       Test 2           1                  
--------------------------------

I managed to delete duplicates from table B using the below code but I haven't been able to update the records in Table A. Each table have over 500 records each.

WITH cte AS(
    SELECT 
        Name,
    ROW_NUMBER() OVER (
        PARTITION BY
            Name
        ORDER BY 
            Name
        )row_num
    FROM ReportmodulesTest
)
    DELETE FROM cte
    WHERE row_num > 1;

回答1:

You would need to update table A first, before deleting from table B.

You tagged your question MySQL but that database would not support the delete statement that you are showing. I suspect that you are running SQL Server, so here is how to do it in that database:

update a
set refid = b.minid
from tablea
inner join (select name, id, min(id) over(partition by name) minid from tableb) b 
    on b.id = a.id and b.minid <> a.id

In MySQL, you would phrase the same query as:

update tablea a
from tablea
inner join (select name, id, min(id) over(partition by name) minid from tableb) b on b.id = a.id
set a.refid = b.minid
where b.minid <> a.id

回答2:

You can update the first table using:

update a join
       (select b.*,
               min(id) over (partition by name) as min_id
        from b
       ) b
       on a.refid = b.id
    set a.refid = b.min_id
    where a.refid <> b.min_id;

Then you can delete rows in the second table with similar logic:

delete b
    from b join
         (select b.*,
                 min(id) over (partition by name) as min_id
          from b
         ) bb
         on bb.id = b.id
    where bb.id <> b.min_id;

回答3:

I found a solution that has made this process easier. I first use Row_Number to find duplicates in Table A and SELECT INTO a temporary table.

SELECT
       a.Id
     , a.Name
     , ROW_NUMBER() OVER(PARTITION BY Name ORDER BY Id DESC) RN
INTO
     #TestTable
FROM
     TableA a WITH(NOLOCK)

I then JOIN Table A and Table B to see where the ID's match and identify which ID I need to keep and which ID's I need to delete:

SELECT
       b.Id
     , b.Name
     , b.RefId
     , ToKeep.Id   KeepId
     , ToDelete.Id DeleteId
FROM
     #TestTable ToDelete
     JOIN TableB b WITH(NOLOCK)
        ON b.RefId = ToDelete.Id
     JOIN #TestTable ToKeep
        ON ToDelete.Name = ToKeep.Name
           AND ToKeep.RN = 1
WHERE ToDelete.RN > 1

Then using a similar statement, I just update the records:

UPDATE b
SET
    b.RefId = ToKeep.Id,
FROM #TestTable ToDelete
     JOIN TableB b WITH(NOLOCK)
        ON b.RefId = ToDelete.Id
     JOIN #TestTable ToKeep
        ON ToDelete.Name = ToKeep.Name
           AND ToKeep.RN = 1
WHERE
      ToDelete.RN > 1

Lastly, I can now delete the duplicate records:

DELETE a
FROM #TestTable b
     INNER JOIN TableA a
        ON b.Id = a.Id
WHERE
      b.RN > 1

After this, you can use the same first SELECT statement to ensure that all duplicates are deleted. Just remove the SELECT INTO statement.

Thanks to an anonymous colleague of mine for this solution and hope this helps someone out there.

来源：https://stackoverflow.com/questions/65340539/how-do-i-update-a-table-that-references-duplicate-records

标签

sql

sql-server

duplicates

sql-update

inner-join