SqlBulkCopy Error handling / continue on error

后端未结

关注

 3  732

花落未央

I am trying to insert huge amount of data into SQL server. My destination table has an unique index called \"Hash\".

I would like to replace my SqlDataAdapter impl

相关标签:

3条回答

误落风尘

2020-12-06 02:52
Slightly different approach than already suggested; Perform the SqlBulkCopy and catch the SqlException thrown:
```
    Violation of PRIMARY KEY constraint 'PK_MyPK'. Cannot insert duplicate 
key in object 'dbo.MyTable'. **The duplicate key value is (17)**.
```
You can then remove all items from your source from ID 17, the first record that was duplicated. I'm making assumptions here that apply to my circumstances and possibly not yours; i.e. that the duplication is caused by the exact same data from a previously failed SqlBulkCopy due to SQL/Network errors during the upload.
0 讨论(0)
发布评论:

提交评论
- 加载中...
后悔当初

2020-12-06 02:59
Note: This is a recap of Sam's answer with slightly more details

Thanks to Sam for the answer. I have put it in an answer due to comment's space constraints.

Deriving from your answer I see two possible approaches:

Solution 1:
- start tran
- grab all possible hit "hash" values by doing "select hash in destinationtable where hash in (val1, val2, ...)
- filter out duplicates and report
- insert data
- commit tran
solution 2:
- Create temp table to mirror the schema of destination table
- bulk insert into the temp table
- start serializable transaction
- Get duplicate rows: "select hash from tempTable where tempTable.hash=destinationTable.hash"
- report on duplicate rows
- Insert the data in the temp table into the destination table: "select * into destinationTable from temptable left join temptable.hash=destinationTable.hash where destinationTable.hash is null"
- commit the tran
Since we have two approaches, it comes down to which approach is the most optimized? Both approaches have to retrieve the duplicate rows and report while the second approach requires extra:
- temp table creation and delete
- one more sql command to move data from temp to destination table
- depends on the percentage of hash collision, it also transfers a lot of unnecessary data across the wire
If these are the only solutions, it seems to me that the first approach wins. What do you guys think? Thanks!
0 讨论(0)
发布评论:

提交评论
- 加载中...
遇见更好的自我

2020-12-06 03:08
SqlBulkCopy, has very limited error handling facilities, by default it doesn't even check constraints.

However, its fast, really really fast.

If you want to work around the duplicate key issue, and identify which rows are duplicates in a batch. One option is:
- start tran
- Grab a tablockx on the table select all current "Hash" values and chuck them in a HashSet.
- Filter out the duplicates and report.
- Insert the data
- commit tran
This process will work effectively if you are inserting huge sets and the size of the initial data in the table is not too huge.

Can you please expand your question to include the rest of the context of the problem.

EDIT

Now that I have some more context here is another way you can go about it:
- Do the bulk insert into a temp table.
- start serializable tran
- Select all temp rows that are already in the destination table ... report on them
- Insert the data in the temp table into the real table, performing a left join on hash and including all the new rows.
- commit the tran
That process is very light on round trips, and considering your specs should end up being really fast;
0 讨论(0)
发布评论:

提交评论
- 加载中...