I am attempting to insert many records using T-SQL\'s MERGE statement, but my query fails to INSERT when there are duplicate records in the source table. The failure is caused b
Solved to your new specification. Only inserting the highest value of col4: This time I used a group by to prevent duplicate rows.
MERGE INTO dbo.tbl1 AS tbl
USING (SELECT col2,col3, max(col4) col4 FROM #tmp group by col2,col3) AS src
ON (tbl.col2 = src.col2 AND tbl.col3 = src.col3)
WHEN NOT MATCHED THEN
INSERT (col2,col3,col4)
VALUES (src.col2,src.col3,src.col4);
Given the source has duplicates and you aren't using MERGE fully, I'd use an INSERT.
INSERT dbo.tbl1 (col2,col3)
SELECT DISTINCT col2,col3
FROM #tmp src
WHERE NOT EXISTS (
SELECT *
FROM dbo.tbl1 tbl
WHERE tbl.col2 = src.col2 AND tbl.col3 = src.col3)
The reason MERGE fails is that it isn't checked row by row. All non-matches are found, then it tries to INSERT all these. It doesn't check for rows in the same batch that already match.
This reminds me a bit of the "Halloween problem" where early data changes of an atomic operation affect later data changes: it isn't correct
Instead of GROUP BY you can use an analytic function, allowing you to select a specific record in the set of duplicate records to merge.
MERGE INTO dbo.tbl1 AS tbl
USING (
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY col2, col3 ORDER BY ModifiedDate DESC) AS Rn
FROM #tmp
) t
WHERE Rn = 1 --choose the most recently modified record
) AS src
ON (tbl.col2 = src.col2 AND tbl.col3 = src.col3)