BULK INSERT into a Self-Referencing Table

六月ゝ 毕业季﹏ 提交于 2019-12-10 19:29:15

问题


This question is part of a much more complex problem that I am breaking down into smaller chunks (for my sanity).

Let's say I have a Parts table that is self-referencing and looks something like this:

PartId  ParentPartId  Description             PartNumber   IsCatHeader     ProviderId
---------------------------------------------------------------------------------------
9292       null       'Engine Parts'           null           1      'Engine Parts||1'
9293       9292       'Engine Bolts'           null           1      'Engine Bolts||1'
9294       9293       '6mm Engine Bolt'        'X1-234-ABC'     0      '6mm Engine Bolt|X1-234-ABC|0'
9295       9293       '5mm Engine Bolt'        'X2-934-BCD'     0      '5mm Engine Bolt|X2-934-BCD|0'
9296       9295       '5mm Engine Bolt Washer' 'X2-934-GED'     0      '5mm Engine Bolt Washer|X2-934-GED|0'

You get the idea. Now... we are importing entire books (massive CSV files) of these parts, to the tune of several hundred line items in a single book.

Parts are often duplicated across books, and part of our job is to keep duplicates out of the database.

The source does not provide any kind of unique id for these parts, so we have created a ProviderId column that is a collection of data parts from each record that creates a unique string. We can then use this to check for duplicates as we do the import. (The actual data in this column is more complex than what I've shown here.)

So, now to my problem. I am trying to figure out the best way to do this in bulk. One option (NOT a good one) is to cycle through each item one at a time from the C# application.... insert a parent, get the SCOPE IDENTITY, insert all children, etc. Yuck. In a large book, this would result in thousands of DB calls per book. Not an option.

We need a bulk insert solution. But we have a real conundrum with the self-referencing aspect of this.

Our original thought was to build the ENTIRE data model in C#, including all of the PartId's and ParentPartId's. Then bulk insert directly into the Parts table. The problem with this, however, is knowing what ID to start with. Keep in mind, multiple processes will be running simultaneously, and many of the parts will be duplicates. We tried using a SEQUENCE object, but that presented problems... it's 100% possible that duplicate books will be processed, which would result in giant gaps in the ID's if we use a SEQUENCE.

The course I am chasing now is this... We have created a Parts_Staging table that looks almost like the actual Parts table. And we can do a bulk insert to it, no problem. Then it's a simple query to use the ProviderId column to look for records in Parts_Staging that don't exist in Parts and move them over.

But with this path, I am not creative (or experienced) enough to imagine a way to do this move / merge and to keep the self-referencing id's intact.

I've been reading threads like "How to Insert data into self reference table in sql server?" and "T-SQL - Insert Data into Parent and Child Tables" but so far I am still not seeing the vision.


回答1:


Ok. Here's what I would do. First, make up the sequence in your collection in C# along with it's parent/child relationship. But I would put that in a different column, maybe something like BatchPartId and BatchParentPartId. (Maybe a different associated table. It doesn't matter.)

| PartId | ParentPartId | BatchPartId | BatchParentPartId |
|--------|--------------|-------------|-------------------|
|        |              | XX1901      |                   |
|        |              | XX1902      | XX1901            |
|        |              | XX1903      | XX1901            |
|        |              | XX1904      | XX1903            |
|        |              | XX1905      | XX1903            |

Then, insert the entire list, allowing the PartId to get created.

| PartId | ParentPartId | BatchPartId | BatchParentPartId |
|--------|--------------|-------------|-------------------|
| 55     |              | XX1901      |                   |
| 56     |              | XX1902      | XX1901            |
| 57     |              | XX1903      | XX1901            |
| 58     |              | XX1904      | XX1903            |
| 59     |              | XX1905      | XX1903            |

At post, this is where you fill in the ParentPartId with the PartId of the row with the corresponding BatchPartId from BatchParentPartId of the child row.

| PartId | ParentPartId | BatchPartId | BatchParentPartId |
|--------|--------------|-------------|-------------------|
| 55     |              | XX1901      |                   |
| 56     | 55           | XX1902      | XX1901            |
| 57     | 55           | XX1903      | XX1901            |
| 58     | 57           | XX1904      | XX1903            |
| 59     | 57           | XX1905      | XX1903            |


来源:https://stackoverflow.com/questions/55049925/bulk-insert-into-a-self-referencing-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!