How do I update a table that references duplicate records?

天涯浪子 提交于 2021-01-29 06:17:54

问题


I have two SQL tables. One gets a reference value from another table which stores a list of Modules and their ID. But these descriptions are not unique. I am trying to remove the duplicates of Table A but I'm not sure how to update Table B to only reference the single values.

Example:

Table A:                                      Table B:

--------------------------------            ------------------------------------
ID      Description      RefID               ID            Name       
--------------------------------            ------------------------------------
1       Test 1           2                   1            QuickReports
--------------------------------            ------------------------------------
2       Test 2           1                   2            QuickReports
--------------------------------            ------------------------------------

I want the results to be the following:

Table A:                                      Table B:

--------------------------------            ------------------------------------
ID      Description      RefID               ID            Name       
--------------------------------            ------------------------------------
1       Test 1           1                   1            QuickReports
--------------------------------            ------------------------------------
2       Test 2           1                  
--------------------------------        

I managed to delete duplicates from table B using the below code but I haven't been able to update the records in Table A. Each table have over 500 records each.

WITH cte AS(
    SELECT 
        Name,
    ROW_NUMBER() OVER (
        PARTITION BY
            Name
        ORDER BY 
            Name
        )row_num
    FROM ReportmodulesTest
)
    DELETE FROM cte
    WHERE row_num > 1;  

回答1:


You would need to update table A first, before deleting from table B.

You tagged your question MySQL but that database would not support the delete statement that you are showing. I suspect that you are running SQL Server, so here is how to do it in that database:

update a
set refid = b.minid
from tablea
inner join (select name, id, min(id) over(partition by name) minid from tableb) b 
    on b.id = a.id and b.minid <> a.id

In MySQL, you would phrase the same query as:

update tablea a
from tablea
inner join (select name, id, min(id) over(partition by name) minid from tableb) b on b.id = a.id
set a.refid = b.minid
where b.minid <> a.id



回答2:


You can update the first table using:

update a join
       (select b.*,
               min(id) over (partition by name) as min_id
        from b
       ) b
       on a.refid = b.id
    set a.refid = b.min_id
    where a.refid <> b.min_id;

Then you can delete rows in the second table with similar logic:

delete b
    from b join
         (select b.*,
                 min(id) over (partition by name) as min_id
          from b
         ) bb
         on bb.id = b.id
    where bb.id <> b.min_id;



回答3:


I found a solution that has made this process easier. I first use Row_Number to find duplicates in Table A and SELECT INTO a temporary table.

SELECT
       a.Id
     , a.Name
     , ROW_NUMBER() OVER(PARTITION BY Name ORDER BY Id DESC) RN
INTO
     #TestTable
FROM
     TableA a WITH(NOLOCK)

I then JOIN Table A and Table B to see where the ID's match and identify which ID I need to keep and which ID's I need to delete:

SELECT
       b.Id
     , b.Name
     , b.RefId
     , ToKeep.Id   KeepId
     , ToDelete.Id DeleteId
FROM
     #TestTable ToDelete
     JOIN TableB b WITH(NOLOCK)
        ON b.RefId = ToDelete.Id
     JOIN #TestTable ToKeep
        ON ToDelete.Name = ToKeep.Name
           AND ToKeep.RN = 1
WHERE ToDelete.RN > 1

Then using a similar statement, I just update the records:

UPDATE b
SET
    b.RefId = ToKeep.Id,
FROM #TestTable ToDelete
     JOIN TableB b WITH(NOLOCK)
        ON b.RefId = ToDelete.Id
     JOIN #TestTable ToKeep
        ON ToDelete.Name = ToKeep.Name
           AND ToKeep.RN = 1
WHERE
      ToDelete.RN > 1

Lastly, I can now delete the duplicate records:

DELETE a
FROM #TestTable b
     INNER JOIN TableA a
        ON b.Id = a.Id
WHERE
      b.RN > 1

After this, you can use the same first SELECT statement to ensure that all duplicates are deleted. Just remove the SELECT INTO statement.

Thanks to an anonymous colleague of mine for this solution and hope this helps someone out there.



来源:https://stackoverflow.com/questions/65340539/how-do-i-update-a-table-that-references-duplicate-records

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!