How to avoid inserting duplicate records when using a T-SQL Merge statement

后端未结

关注

 3  708

I am attempting to insert many records using T-SQL\'s MERGE statement, but my query fails to INSERT when there are duplicate records in the source table. The failure is caused b

相关标签:

3条回答

被撕碎了的回忆

2021-02-19 01:01

Solved to your new specification. Only inserting the highest value of col4: This time I used a group by to prevent duplicate rows.

MERGE INTO dbo.tbl1 AS tbl 
USING (SELECT col2,col3, max(col4) col4 FROM #tmp group by col2,col3) AS src 
ON (tbl.col2 = src.col2 AND tbl.col3 = src.col3) 
WHEN NOT MATCHED THEN  
    INSERT (col2,col3,col4) 
    VALUES (src.col2,src.col3,src.col4);

0 讨论(0)

醉话见心

2021-02-19 01:03
Given the source has duplicates and you aren't using MERGE fully, I'd use an INSERT.
```
 INSERT dbo.tbl1 (col2,col3) 
 SELECT DISTINCT col2,col3
 FROM #tmp src
 WHERE NOT EXISTS (
       SELECT *
       FROM dbo.tbl1 tbl
       WHERE tbl.col2 = src.col2 AND tbl.col3 = src.col3)
```
The reason MERGE fails is that it isn't checked row by row. All non-matches are found, then it tries to INSERT all these. It doesn't check for rows in the same batch that already match.

This reminds me a bit of the "Halloween problem" where early data changes of an atomic operation affect later data changes: it isn't correct
0 讨论(0)
发布评论:

提交评论
- 加载中...

执笔经年

2021-02-19 01:09

Instead of GROUP BY you can use an analytic function, allowing you to select a specific record in the set of duplicate records to merge.

MERGE INTO dbo.tbl1 AS tbl
USING (
    SELECT *
    FROM (
        SELECT *, ROW_NUMBER() OVER (PARTITION BY col2, col3 ORDER BY ModifiedDate DESC) AS Rn
        FROM #tmp
    ) t
    WHERE Rn = 1    --choose the most recently modified record
) AS src
ON (tbl.col2 = src.col2 AND tbl.col3 = src.col3)

0 讨论(0)