How to avoid inserting duplicate records when using a T-SQL Merge statement

后端 未结 3 708
再見小時候
再見小時候 2021-02-19 00:32

I am attempting to insert many records using T-SQL\'s MERGE statement, but my query fails to INSERT when there are duplicate records in the source table. The failure is caused b

相关标签:
3条回答
  • 2021-02-19 01:01

    Solved to your new specification. Only inserting the highest value of col4: This time I used a group by to prevent duplicate rows.

    MERGE INTO dbo.tbl1 AS tbl 
    USING (SELECT col2,col3, max(col4) col4 FROM #tmp group by col2,col3) AS src 
    ON (tbl.col2 = src.col2 AND tbl.col3 = src.col3) 
    WHEN NOT MATCHED THEN  
        INSERT (col2,col3,col4) 
        VALUES (src.col2,src.col3,src.col4); 
    
    0 讨论(0)
  • 2021-02-19 01:03

    Given the source has duplicates and you aren't using MERGE fully, I'd use an INSERT.

     INSERT dbo.tbl1 (col2,col3) 
     SELECT DISTINCT col2,col3
     FROM #tmp src
     WHERE NOT EXISTS (
           SELECT *
           FROM dbo.tbl1 tbl
           WHERE tbl.col2 = src.col2 AND tbl.col3 = src.col3)
    

    The reason MERGE fails is that it isn't checked row by row. All non-matches are found, then it tries to INSERT all these. It doesn't check for rows in the same batch that already match.

    This reminds me a bit of the "Halloween problem" where early data changes of an atomic operation affect later data changes: it isn't correct

    0 讨论(0)
  • 2021-02-19 01:09

    Instead of GROUP BY you can use an analytic function, allowing you to select a specific record in the set of duplicate records to merge.

    MERGE INTO dbo.tbl1 AS tbl
    USING (
        SELECT *
        FROM (
            SELECT *, ROW_NUMBER() OVER (PARTITION BY col2, col3 ORDER BY ModifiedDate DESC) AS Rn
            FROM #tmp
        ) t
        WHERE Rn = 1    --choose the most recently modified record
    ) AS src
    ON (tbl.col2 = src.col2 AND tbl.col3 = src.col3)
    
    0 讨论(0)
提交回复
热议问题